diff --git a/DirectProgramming/C++SYCL_FPGA/README.md b/DirectProgramming/C++SYCL_FPGA/README.md index 1c09366556..d6be0de2b7 100644 --- a/DirectProgramming/C++SYCL_FPGA/README.md +++ b/DirectProgramming/C++SYCL_FPGA/README.md @@ -269,8 +269,6 @@ qsub -I -l nodes=1:fpga_runtime:ppn=2 -d . Only `fpga_compile` nodes support compiling to FPGA. When compiling for FPGA hardware, increase the job timeout to 24 hours. -Executing programs on FPGA hardware is only supported on `fpga_runtime` nodes of the appropriate type, such as `fpga_runtime:arria10` or `fpga_runtime:stratix10`. - Neither compiling nor executing programs on FPGA hardware are supported on the login nodes. For more information, see the [Intel® oneAPI Base Toolkit Get Started Guide](https://devcloud.intel.com/oneapi/documentation/base-toolkit/). >**Note**: Since Intel® DevCloud for oneAPI includes the appropriate development environment already configured for you, you do not need to set environment variables. diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md index 4ab42594ac..d0c85dfb36 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/README.md @@ -37,7 +37,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel Xeon® CPU E5-1650 v2 @ 3.50GHz (host machine) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -48,6 +48,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -149,17 +151,26 @@ The design uses the following generic header files. ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -181,23 +192,27 @@ The design uses the following generic header files. make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/anr.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/anr.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -229,11 +244,11 @@ The design uses the following generic header files. ``` ./anr.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./anr.fpga_sim ``` -3. Alternatively, run the sample on the FPGA device. +3. Alternatively, run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./anr.fpga ``` @@ -244,13 +259,13 @@ The design uses the following generic header files. ``` anr.fpga_emu.exe ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 anr.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Alternatively, run the sample on the FPGA device. +3. Alternatively, run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` anr.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt index d6ee2236af..c7ef09ab18 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/CMakeLists.txt @@ -6,12 +6,36 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + else() + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \ + -DDEVICE_FLAG=Agilex.") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() endif() # These are Windows-specific flags: @@ -46,11 +70,11 @@ endif() # e.g. cmake .. -DSEED=7 if(NOT DEFINED SEED) # the default seed - if(FPGA_DEVICE MATCHES ".*a10.*") + if(DEVICE_FLAG MATCHES "A10") set(SEED 1) - elseif(FPGA_DEVICE MATCHES ".*s10.*") + elseif(DEVICE_FLAG MATCHES "S10") set(SEED 2) - elseif(FPGA_DEVICE MATCHES ".*agilex.*") + elseif(DEVICE_FLAG MATCHES "Agilex") set(SEED 3) else() set(SEED 4) @@ -79,11 +103,11 @@ if(PIXELS_PER_CYCLE) message(STATUS "PIXELS_PER_CYCLE explicitly set to ${PIXELS_PER_CYCLE}") else() # Default PIXELS_PER_CYCLE based on the board being used - if(FPGA_DEVICE MATCHES ".*a10.*") + if(DEVICE_FLAG MATCHES "A10") set(PIXELS_PER_CYCLE 2) - elseif(FPGA_DEVICE MATCHES ".*s10.*") + elseif(DEVICE_FLAG MATCHES "S10") set(PIXELS_PER_CYCLE 2) - elseif(FPGA_DEVICE MATCHES ".*agilex.*") + elseif(DEVICE_FLAG MATCHES "Agilex") set(PIXELS_PER_CYCLE 1) else() message(WARNING "Unknown board: setting PIXELS_PER_CYCLE to 1") @@ -120,13 +144,13 @@ endif() # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -Xssimulation -DFPGA_SIMULATOR") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") -set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${IP_MODE_FLAG} ${USER_HARDWARE_FLAGS}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -Xssimulation -DFPGA_SIMULATOR ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}") +set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${FILTER_SIZE_FLAG} ${PIXELS_PER_CYCLE_FLAG} ${MAX_COLS_FLAG} ${PIXEL_BITS_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp index 7ed3d2fe1d..5c0de80dcf 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/anr.hpp @@ -347,15 +347,7 @@ std::vector SubmitANRKernels(queue& q, int cols, int rows, // submit the vertical kernel using a column stencil auto vertical_kernel = q.single_task([=] { // copy host side intensity sigma LUT to the device - // For testing the kernel system as an IP and checking the area and Fmax, - // we allow the user to turn off connections to device memory. In this case - // (the DISABLE_DEVICE_MEM macro IS defined), the results will be incorrect - // since there is no way to get the data to/from the device. -#if defined(IP_MODE) - IntensitySigmaLUT sig_i_lut; -#else IntensitySigmaLUT sig_i_lut(sig_i_lut_data_ptr); -#endif // build the constexpr exp() and inverse LUT ROMs constexpr ExpLUT exp_lut; diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp index 905f13f2af..127980e148 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/dma_kernels.hpp @@ -22,11 +22,13 @@ template event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) { using PipeType = DataBundle; +#if defined (IS_BSP) // LSU attribute to turn off caching using NonCachingLSU = ext::intel::lsu, ext::intel::cache<0>, ext::intel::statically_coalesce, ext::intel::prefetch>; +#endif // validate the number of columns if ((cols % pixels_per_cycle) != 0) { @@ -41,7 +43,12 @@ event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) { // Using device memory return q.single_task([=]() [[intel::kernel_args_restrict]] { + +#if defined (IS_BSP) device_ptr in(in_ptr); +#else + T* in(in_ptr); +#endif // coalesce the following two loops into a single for-loop using the // loop_coalesce attribute @@ -51,7 +58,11 @@ event SubmitInputDMA(queue &q, T *in_ptr, int rows, int cols, int frames) { PipeType pipe_data; #pragma unroll for (int k = 0; k < pixels_per_cycle; k++) { +#if defined (IS_BSP) pipe_data[k] = NonCachingLSU::load(in + i * pixels_per_cycle + k); +#else + pipe_data[k] = in[i * pixels_per_cycle + k]; +#endif } Pipe::write(pipe_data); } @@ -77,7 +88,12 @@ event SubmitOutputDMA(queue &q, T *out_ptr, int rows, int cols, int frames) { // Using device memory return q.single_task([=]() [[intel::kernel_args_restrict]] { + +#if defined (IS_BSP) device_ptr out(out_ptr); +#else + T* out(out_ptr); +#endif // coalesce the following two loops into a single for-loop using the // loop_coalesce attribute diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp index d35acf52d9..4367fcf5c5 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/intensity_sigma_lut.hpp @@ -16,6 +16,7 @@ class IntensitySigmaLUT { // default constructor IntensitySigmaLUT() {} +#if defined (IS_BSP) // construct from a device_ptr (for constructing from device memory) IntensitySigmaLUT(device_ptr ptr) { // use a pipelined LSU to load from device memory since we don't @@ -25,6 +26,14 @@ class IntensitySigmaLUT { data_[i] = PipelinedLSU::load(ptr + i); } } +#else + // construct from a regular pointer + IntensitySigmaLUT(float* ptr) { + for (int i = 0; i < lut_depth; i++) { + data_[i] = ptr[i]; + } + } +#endif // construct from the ANR parameters (actually builds the LUT) IntensitySigmaLUT(ANRParams params) { @@ -39,8 +48,12 @@ class IntensitySigmaLUT { } // helper static method to allocate enough memory to hold the LUT - static float* AllocateDevice(sycl::queue& q) { + static float* Allocate(sycl::queue& q) { +#if defined (IS_BSP) float* ptr = sycl::malloc_device(lut_depth, q); +#else + float* ptr = sycl::malloc_shared(lut_depth, q); +#endif if (ptr == nullptr) { std::cerr << "ERROR: could not allocate space for 'ptr'\n"; std::terminate(); @@ -49,7 +62,7 @@ class IntensitySigmaLUT { } // helper method to copy the data to the device - sycl::event CopyDataToDevice(sycl::queue& q, float* ptr) { + sycl::event CopyData(sycl::queue& q, float* ptr) { return q.memcpy(ptr, data_, lut_depth * sizeof(float)); } diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp index ad9d8ae466..649744dd88 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/anr/src/main.cpp @@ -117,6 +117,7 @@ int main(int argc, char* argv[]) { // create the output pixels (initialize to all 0s) std::vector out_pixels(in_pixels.size(), 0); +#if defined (IS_BSP) // allocate memory on the device for the input and output PixelT *in, *out; if ((in = malloc_device(pixel_count, q)) == nullptr) { @@ -127,18 +128,31 @@ int main(int argc, char* argv[]) { std::cerr << "ERROR: could not allocate space for 'out'\n"; std::terminate(); } +#else + // allocate memory on the host for the input and output + PixelT *in, *out; + if ((in = malloc_shared(pixel_count, q)) == nullptr) { + std::cerr << "ERROR: could not allocate space for 'in'\n"; + std::terminate(); + } + if ((out = malloc_shared(pixel_count, q)) == nullptr) { + std::cerr << "ERROR: could not allocate space for 'out'\n"; + std::terminate(); + } +#endif + // copy the input data to the device memory and wait for the copy to finish q.memcpy(in, in_pixels.data(), pixel_count * sizeof(PixelT)).wait(); // allocate space for the intensity sigma LUT - float* sig_i_lut_data_ptr = IntensitySigmaLUT::AllocateDevice(q); + float* sig_i_lut_data_ptr = IntensitySigmaLUT::Allocate(q); // create the intensity sigma LUT data locally on the host IntensitySigmaLUT sig_i_lut_host(params); // copy the intensity sigma LUT to the device - sig_i_lut_host.CopyDataToDevice(q, sig_i_lut_data_ptr).wait(); + sig_i_lut_host.CopyData(q, sig_i_lut_data_ptr).wait(); ////////////////////////////////////////////////////////////////////////////// // track timing information in ms diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md index f74484a9f7..e61f97b305 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/README.md @@ -40,18 +40,22 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX FPGA)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler -> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. -> -> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH: -> - Questa*-Intel® FPGA Edition -> - Questa*-Intel® FPGA Starter Edition -> - ModelSim® SE +> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. +> +> For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH: +> - Questa*-Intel® FPGA Edition +> - Questa*-Intel® FPGA Starter Edition +> - ModelSim® SE +> +> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. > -> When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. + +> :warning: This sample is benchmarking an FPGA board, therefore it should really be used when targeting an FPGA board/BSP. + ## Key Implementation Details A oneAPI Board Support Package (BSP) consists of software layers and an FPGA hardware scaffold design, making it possible to target an FPGA through the Intel® oneAPI DPC++/C++ Compiler. @@ -118,21 +122,26 @@ Performance results are based on testing as of Jan 31, 2022. ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® PAC with Intel Arria® 10 GX FPGA**, enter the following: - ``` - cmake -DFPGA_DEVICE=intel_a10gx_pac:pac_a10 .. - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system, and enter a command similar to the following example: - ``` - cmake -DFPGA_DEVICE=: .. - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -150,27 +159,28 @@ Performance results are based on testing as of Jan 31, 2022. make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/board_test.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/board_test.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® PAC with Intel Arria® 10 GX FPGA**, enter the following: - ``` - cmake -G "NMake Makefiles" -DFPGA_DEVICE=intel_a10gx_pac:pac_a10 .. - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system, and enter a command similar to the following example: - ``` - cmake -G "NMake Makefiles" -DFPGA_DEVICE=: .. - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -229,7 +239,7 @@ To view test details and usage information using the binary, use the `-help` opt ``` ./board_test.fpga_emu ``` - 2. Run the sample on the FPGA device. + 2. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./board_test.fpga ``` @@ -247,6 +257,14 @@ To view test details and usage information using the binary, use the `-help` opt ``` board_test.exe -test= ``` + 2. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). + ``` + ./board_test.fpga.exe + ``` + By default the program runs all tests. To run a specific test, enter the test number as an argument to the `-test` option: + ``` + ./board_test.fpga.exe -test= + ``` ## Example Output diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt index e11d99f43b..01c6b46987 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/board_test/src/CMakeLists.txt @@ -9,12 +9,17 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_s10sx_pac:pac_s10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Stratix(R) 10 SX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") +endif() + +# Check if the target is a BSP +if(NOT FPGA_DEVICE MATCHES ".*:.*") + message(STATUS "This sample is made to target BSPs as this is a benchmarking sample.") endif() # This is a Windows-specific flag that enables error handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md index f38d47df95..209172f2fe 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/README.md @@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware |Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel Xeon® CPU E5-1650 v2 @ 3.50GHz (host machine) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ### Performance @@ -145,16 +147,26 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, and ` ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX** FPGA, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following command instead: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -176,23 +188,28 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, and ` make fpga ``` - (Optional) The hardware compile may take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for the Intel® PAC with Intel Arria® 10 GX FPGA, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), enter the following command instead: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device): ``` @@ -236,7 +253,7 @@ You can apply the Cholesky decomposition to a number of matrices, as shown below ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./cholesky.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./cholesky.fpga ``` @@ -253,7 +270,7 @@ You can apply the Cholesky decomposition to a number of matrices, as shown below cholesky.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` cholesky.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt index e52aa0d3d3..0dd6a5a000 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/CMakeLists.txt @@ -7,42 +7,53 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") -endif() + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") -# This is a Windows-specific flag that enables error handling in host code -if(WIN32) - set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise") - set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise") -else() - set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise") - set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise") + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() endif() +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") +endif() -# A10 parameters -set(MATRIX_DIMENSION 32) -set(COMPLEX 0) -set(FIXED_ITERATIONS 39) -set(CLOCK_TARGET 360MHz) -set(SEED "-Xsseed=29") -# Overwrite design parameters according to the selected board -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters - # Nothing to do -elseif(FPGA_DEVICE MATCHES ".*s10.*") + set(MATRIX_DIMENSION 32) + set(COMPLEX 0) + set(FIXED_ITERATIONS 39) + set(CLOCK_TARGET 360MHz) + set(SEED "-Xsseed=29") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(MATRIX_DIMENSION 32) set(COMPLEX 0) set(FIXED_ITERATIONS 44) set(CLOCK_TARGET 450MHz) set(SEED "-Xsseed=5") -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ parameters set(MATRIX_DIMENSION 32) set(FIXED_ITERATIONS 45) @@ -50,8 +61,16 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*") set(CLOCK_TARGET 520MHz) set(SEED "-Xsseed=5") else() - message(STATUS "Unknown board ${FPGA_DEVICE}!") - message(STATUS "Using Arria 10 defaults.") + message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") +endif() + +# This is a Windows-specific flag that enables error handling in host code +if(WIN32) + set(PLATFORM_SPECIFIC_COMPILE_FLAGS "/EHsc /Qactypes /Wall /fp:precise") + set(PLATFORM_SPECIFIC_LINK_FLAGS "/Qactypes /fp:precise") +else() + set(PLATFORM_SPECIFIC_COMPILE_FLAGS "-qactypes -Wall -fno-finite-math-only -fp-model=precise") + set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise") endif() if(IGNORE_DEFAULT_SEED) @@ -79,12 +98,12 @@ message(STATUS "SEED=${SEED}") # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_SIMULATOR_FLAGS}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp index d0729d4a29..80bc2df92d 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/cholesky.hpp @@ -54,8 +54,14 @@ void CholeskyDecompositionImpl( sycl::ext::intel::pipe; // Allocate FPGA DDR memory. +#if defined (IS_BSP) TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q); TT *l_device = sycl::malloc_device(kLMatrixSize * matrix_count, q); +#else + // malloc_device are not supported when targetting an FPGA part/family + TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q); + TT *l_device = sycl::malloc_shared(kLMatrixSize * matrix_count, q); +#endif if ((a_device == nullptr) || (l_device == nullptr)) { std::cerr << "Error when allocating FPGA DDR" << std::endl; @@ -93,8 +99,6 @@ void CholeskyDecompositionImpl( constexpr int kLoopIter = (kLMatrixSize / kNumElementsPerDDRBurst) + kExtraIteration; - sycl::device_ptr vector_ptr_device(l_device); - // Repeat matrix_count complete L matrix pipe reads // for as many repetitions as needed // The loop coalescing directive merges the two outer loops together @@ -105,6 +109,18 @@ void CholeskyDecompositionImpl( for (int li = 0; li < kLoopIter; li++) { TT bank[kNumElementsPerDDRBurst]; +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr vector_ptr(l_device); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* vector_ptr(l_device); +#endif + for (int k = 0; k < kNumElementsPerDDRBurst; k++) { if (((li * kNumElementsPerDDRBurst) + k) < kLMatrixSize) { bank[k] = LMatrixPipe::read(); @@ -117,7 +133,7 @@ void CholeskyDecompositionImpl( #pragma unroll for (int k = 0; k < kNumElementsPerDDRBurst; k++) { if (((li * kNumElementsPerDDRBurst) + k) < kLMatrixSize) { - vector_ptr_device[(matrix_idx * kLMatrixSize) + + vector_ptr[(matrix_idx * kLMatrixSize) + (li * kNumElementsPerDDRBurst) + k] = bank[k]; } } @@ -125,7 +141,7 @@ void CholeskyDecompositionImpl( // Write a burst of kNumElementsPerDDRBurst elements to DDR #pragma unroll for (int k = 0; k < kNumElementsPerDDRBurst; k++) { - vector_ptr_device[(matrix_idx * kLMatrixSize) + + vector_ptr[(matrix_idx * kLMatrixSize) + (li * kNumElementsPerDDRBurst) + k] = bank[k]; } } diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp index 6d25905d41..f587925870 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky/src/memory_transfers.hpp @@ -38,8 +38,6 @@ void MatrixReadFromDDRToPipe( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); - // Repeatedly read matrix_count matrices from DDR and send them to the pipe for (int repetition = 0; repetition < repetitions; repetition++) { for (int matrix_index = 0; matrix_index < matrix_count; matrix_index++) { @@ -47,6 +45,18 @@ void MatrixReadFromDDRToPipe( // Only useful in the case of kIncompleteBurst int load_index = 0; +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif + [[intel::initiation_interval(1)]] // NO-FORMAT: Attribute for (ac_int li = 0; li < kLoopIter; li++) { bool last_burst_of_col; @@ -71,12 +81,12 @@ void MatrixReadFromDDRToPipe( // memory address that may be beyond the matrix last address) if (!out_of_bounds) { ddr_read.template get() = - matrix_ptr_device[matrix_index * kMatrixSize + load_index + + matrix_ptr_located[matrix_index * kMatrixSize + load_index + k]; } } else { ddr_read.template get() = - matrix_ptr_device[matrix_index * kMatrixSize + + matrix_ptr_located[matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k]; } }); diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md index 88dcfbb230..dfd57273b0 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/README.md @@ -57,7 +57,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -68,6 +68,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ### Performance @@ -167,16 +169,26 @@ Additionaly, the cmake build system can be configured using the following parame ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -198,23 +210,27 @@ Additionaly, the cmake build system can be configured using the following parame make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky_inversion.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/cholesky_inversion.fpga.tar.gz). - ### On Windows* -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -263,7 +279,7 @@ You can apply the Cholesky-based inversion to 8 matrices repeated a number of ti ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./cholesky_inversion.fpga_sim ``` -3. Run on the FPGA device. +3. Run on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./cholesky_inversion.fpga ``` @@ -280,7 +296,7 @@ You can apply the Cholesky-based inversion to 8 matrices repeated a number of ti cholesky_inversion.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run on the FPGA device. +3. Run on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` cholesky_inversion.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt index 1b464c424e..16f31b9059 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/CMakeLists.txt @@ -7,12 +7,36 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") endif() # This is a Windows-specific flag that enables error handling in host code @@ -24,20 +48,15 @@ else() set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise ") endif() - -# A10 parameters -set(MATRIX_DIMENSION 32) -set(COMPLEX 0) -set(FIXED_ITERATIONS_DECOMPOSITION 39) -set(FIXED_ITERATIONS_INVERSION 34) -set(CLOCK_TARGET 360MHz) -set(SEED "-Xsseed=29") - -# Set design parameters according to the selected board -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters - # Nothing to do -elseif(FPGA_DEVICE MATCHES ".*s10.*") + set(MATRIX_DIMENSION 32) + set(COMPLEX 0) + set(FIXED_ITERATIONS_DECOMPOSITION 39) + set(FIXED_ITERATIONS_INVERSION 34) + set(CLOCK_TARGET 360MHz) + set(SEED "-Xsseed=29") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(MATRIX_DIMENSION 32) set(COMPLEX 0) @@ -45,7 +64,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*") set(FIXED_ITERATIONS_INVERSION 44) set(CLOCK_TARGET 450MHz) set(SEED "-Xsseed=5") -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ parameters set(MATRIX_DIMENSION 32) set(FIXED_ITERATIONS_DECOMPOSITION 45) @@ -54,8 +73,7 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*") set(CLOCK_TARGET 520MHz) set(SEED "-Xsseed=5") else() - message(STATUS "Unknown board ${FPGA_DEVICE}!") - message(STATUS "Using Arria 10 defaults.") + message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") endif() if(IGNORE_DEFAULT_SEED) @@ -88,12 +106,12 @@ message(STATUS "SEED=${SEED}") # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_HARDWARE_FLAGS}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_DECOMPOSITION=${FIXED_ITERATIONS_DECOMPOSITION} -DFIXED_ITERATIONS_INVERSION=${FIXED_ITERATIONS_INVERSION} -DCOMPLEX=${COMPLEX} -DMATRIX_DIMENSION=${MATRIX_DIMENSION} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp index 7f67dfdc38..22e8faac39 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/cholesky_inversion.hpp @@ -62,8 +62,14 @@ void CholeskyInversionImpl( sycl::ext::intel::pipe; // Allocate FPGA DDR memory. +#if defined (IS_BSP) TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q); TT *i_device = sycl::malloc_device(kIMatrixSize * matrix_count, q); +#else + // malloc_device are not supported when targetting an FPGA part/family + TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q); + TT *i_device = sycl::malloc_shared(kIMatrixSize * matrix_count, q); +#endif if ((a_device == nullptr) || (i_device == nullptr)) { std::cerr << "Error when allocating FPGA DDR" << std::endl; diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp index 1a40f3915f..4644e6c954 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/cholesky_inversion/src/memory_transfers.hpp @@ -38,7 +38,17 @@ void MatrixReadFromDDRToPipe( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif // Repeatedly read matrix_count matrices from the DDR and send them to the // pipe @@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe( // memory address that may be beyond the matrix last address) if (!out_of_bounds) { ddr_read.template get() = - matrix_ptr_device[matrix_index * kMatrixSize + load_index + + matrix_ptr_located[matrix_index * kMatrixSize + load_index + k]; } } else { ddr_read.template get() = - matrix_ptr_device[matrix_index * kMatrixSize + + matrix_ptr_located[matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k]; } }); @@ -118,7 +128,17 @@ void VectorReadFromPipeToDDR( constexpr int kExtraIteration = kIncompleteBurst ? 1 : 0; constexpr int kLoopIter = (vector_size / num_elem_per_bank) + kExtraIteration; - sycl::device_ptr vector_ptr_device(vector_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr vector_ptr_located(vector_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* vector_ptr_located(vector_ptr); +#endif // Repeat vector_count complete I vector pipe reads // for as many repetitions as needed @@ -139,7 +159,7 @@ void VectorReadFromPipeToDDR( #pragma unroll for (int k = 0; k < num_elem_per_bank; k++) { if (((li * num_elem_per_bank) + k) < vector_size) { - vector_ptr_device[(vector_idx * vector_size) + + vector_ptr_located[(vector_idx * vector_size) + (li * num_elem_per_bank) + k] = bank[k]; } } @@ -147,7 +167,7 @@ void VectorReadFromPipeToDDR( // Write a burst of num_elem_per_bank elements to DDR #pragma unroll for (int k = 0; k < num_elem_per_bank; k++) { - vector_ptr_device[(vector_idx * vector_size) + + vector_ptr_located[(vector_idx * vector_size) + (li * num_elem_per_bank) + k] = bank[k]; } } diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md index 4c1e0e1a76..58026eb161 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/README.md @@ -39,7 +39,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -50,6 +50,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ### Performance @@ -151,16 +153,26 @@ This design measures the FPGA performance to determine how many assets can be pr ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -178,23 +190,27 @@ This design measures the FPGA performance to determine how many assets can be pr make fpga ``` - (Optional) As the above hardware compile may take several hours to complete, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/crr.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/crr.fpga.tar.gz). - ### On Windows* -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -229,7 +245,7 @@ This design measures the FPGA performance to determine how many assets can be pr ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./crr.fpga_sim [-o=] ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./crr.fpga [-o=] ``` @@ -250,7 +266,7 @@ This design measures the FPGA performance to determine how many assets can be pr set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` crr.fpga.exe [-o=] ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt index 448a5a0769..9e1667f88c 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/crr/src/CMakeLists.txt @@ -6,12 +6,27 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \ + -DDEVICE_FLAG=Agilex.") endif() # This is a Windows-specific flag that enables error handling in host code @@ -20,19 +35,19 @@ if(WIN32) endif() # Set design parameters according to the selected board -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters set(OUTER_UNROLL 1) set(INNER_UNROLL 64) set(OUTER_UNROLL_POW2 1) set(SEED "-Xsseed=1") -elseif(FPGA_DEVICE MATCHES ".*s10.*") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(OUTER_UNROLL 2) set(INNER_UNROLL 64) set(OUTER_UNROLL_POW2 2) set(SEED "-Xsseed=2") -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ set(OUTER_UNROLL 2) set(INNER_UNROLL 64) diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md index 31c532ef45..e8202429be 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/README.md @@ -10,7 +10,9 @@ This reference design demonstrates how to use an FPGA to accelerate database que ## Purpose -The database query acceleration sample includes 8 tables and a set of 21 business-oriented queries with broad industry-wide relevance. This reference design shows how four queries can be accelerated using the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) and oneAPI. To do so, we create a set of common database operators (found in the `src/db_utils/` directory) that are combined in different ways to build the four queries. +The database query acceleration sample includes 8 tables and a set of 21 business-oriented queries with broad industry-wide relevance. This reference design shows how four queries can be accelerated using oneAPI. To do so, we create a set of common database operators (found in the `src/db_utils/` directory) that are combined in different ways to build the four queries. + +Note that this design uses a lot of resources and is designed with Intel® Stratix® 10 FPGA capabilities in mind. ## Prerequisites @@ -38,7 +40,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description --- |--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -49,8 +51,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - -> **Note**: This example design is only officially supported for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX). +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ### Performance @@ -144,7 +146,7 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d ### On Linux* 1. Change to the sample directory. -2. Configure the build system for query number 1. +2. Configure the build system for the default target (the Agilex™ device family). ``` mkdir build cd build @@ -152,6 +154,18 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d ``` `-DQUERY=` can be any of the following query numbers: `1`, `9`, `11` or `12`. + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DQUERY= -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DQUERY= -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -168,7 +182,7 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d ``` The report resides at `db_report.prj/reports/report.html`. - >**Note**: If you are compiling Query 9 (`-DQUERY=9`), expect a long report generation time. You can download pre-generated reports from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz). + >**Note**: If you are compiling Query 9 (`-DQUERY=9`), expect a long report generation time. 4. Compile for FPGA hardware (longer compile time, targets FPGA device). @@ -178,21 +192,29 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d When building for hardware, the default scale factor is **1**. To use the smaller scale factor of 0.01, add the flag `-DSF_SMALL=1` to the original `cmake` command. For example: `cmake .. -DQUERY=11 -DSF_SMALL=1`. See the [Database files](#database-files) for more information. - (Optional) The hardware compile may take several hours to complete. You can download a pre-compiled binary (compatible with Linux* Ubuntu* 18.04) for an Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/db.fpga.tar.gz). - ### On Windows* ->**Note**: The FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) does not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for query number 1. +2. Configure the build system for the default target (the Agilex™ device family). ``` mkdir build cd build - cmake -G "NMake Makefiles" -DQUERY=1 + cmake -G "NMake Makefiles" .. -DQUERY=1 ``` `-DQUERY=` can be any of the following query numbers: `1`, `9`, `11` or `12`. + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DQUERY= -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DQUERY= -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -238,11 +260,11 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d ./db.fpga_emu --dbroot=../data/sf0.01 --test ``` (Optional) Run the design for queries `9`, `11` and `12`. -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./db.fpga_sim --dbroot=../data/sf0.01 --test ``` -3. Run the design on an FPGA device. +3. Run the design on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./db.fpga --dbroot=../data/sf1 --test ``` @@ -254,13 +276,13 @@ Query 12 showcases the `MergeJoin` database operator. The block diagram of the d db.fpga_emu.exe --dbroot=../data/sf0.01 --test ``` (Optional) Run the design for queries `9`, `11` and `12`. -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 db.fpga_sim.exe --dbroot=../data/sf0.01 --test set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on an FPGA device. +3. Run the sample on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` db.fpga.exe --dbroot=../data/sf1 --test ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt index 339f3e0a5d..63ab8c7ed5 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/db/src/CMakeLists.txt @@ -12,29 +12,29 @@ else() message(STATUS "\tQUERY=${QUERY}") endif() -# select default board based on query -if(${QUERY} EQUAL 1) - set(DEFAULT_BOARD "intel_a10gx_pac:pac_a10") - set(DEFAULT_BOARD_STR "Intel Arria(R) 10 GX") -elseif(${QUERY} EQUAL 9) - set(DEFAULT_BOARD "intel_s10sx_pac:pac_s10") - set(DEFAULT_BOARD_STR "Intel Stratix(R) 10 SX") -elseif(${QUERY} EQUAL 11) - set(DEFAULT_BOARD "intel_s10sx_pac:pac_s10") - set(DEFAULT_BOARD_STR "Intel Stratix(R) 10 SX") -elseif(${QUERY} EQUAL 12) - set(DEFAULT_BOARD "intel_a10gx_pac:pac_a10") - set(DEFAULT_BOARD_STR "Intel Arria(R) 10 GX") -endif() - # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE ${DEFAULT_BOARD}) + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with ${DEFAULT_BOARD_STR} FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Please make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or \ + -DDEVICE_FLAG=Agilex.") endif() # This is a Windows-specific flag that enables error handling in host code @@ -53,7 +53,7 @@ endif() # Pick the default seed if the user did not specify one to CMake. # We do a seed sweep to find a good seed by default if(NOT DEFINED SEED) - if(${FPGA_DEVICE} MATCHES ".*a10.*") + if(DEVICE_FLAG MATCHES "A10") if(${QUERY} EQUAL 1) set(SEED "-Xsseed=2") elseif(${QUERY} EQUAL 9) @@ -63,7 +63,7 @@ if(NOT DEFINED SEED) elseif(${QUERY} EQUAL 12) set(SEED "-Xsseed=2") endif() - elseif(${FPGA_DEVICE} MATCHES ".*s10.*") + elseif(DEVICE_FLAG MATCHES "S10") if(${QUERY} EQUAL 1) set(SEED "-Xsseed=3") elseif(${QUERY} EQUAL 9) @@ -73,7 +73,7 @@ if(NOT DEFINED SEED) elseif(${QUERY} EQUAL 12) set(SEED "-Xsseed=2") endif() - elseif(${FPGA_DEVICE} MATCHES ".*agilex.*") + elseif(DEVICE_FLAG MATCHES "Agilex") if(${QUERY} EQUAL 1) set(SEED "-Xsseed=2") elseif(${QUERY} EQUAL 9) @@ -93,7 +93,7 @@ if(IGNORE_DEFAULT_SEED) endif() # Error out if trying to run Q9 or Q11 on Arria 10 -if (${FPGA_DEVICE} MATCHES ".*a10.*") +if (DEVICE_FLAG MATCHES "A10") if(${QUERY} EQUAL 9 OR ${QUERY} EQUAL 11) message(FATAL_ERROR "Queries 9 and 11 are not supported on Arria 10 devices") endif() diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md index fcb5b7b8d9..6a6365ad39 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/README.md @@ -36,7 +36,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -47,6 +47,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -302,21 +304,31 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` + To select between GZIP and Snappy decompression, use `-DGZIP=1` or `-DSNAPPY=1`. If you do not specify the decompression, the code defaults to **Snappy**. ``` cmake .. -DGZIP=1 cmake .. -DSNAPPY=1 ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -339,14 +351,10 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/decompress.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/decompress.fpga.tar.gz). - ### On Windows* -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build @@ -357,10 +365,19 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl cmake -G "NMake Makefiles" .. -DGZIP=1 cmake -G "NMake Makefiles" .. -DSNAPPY=1 ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -391,11 +408,11 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl ``` ./decompress.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./decompress.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./decompress.fpga ``` @@ -406,13 +423,13 @@ For `constexpr_math.hpp`, `memory_utils.hpp`, `metaprogramming_utils.hpp`, `tupl ``` decompress.fpga_emu.exe ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 decompress.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` decompress.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt index d01e36b8cd..dc78305402 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/CMakeLists.txt @@ -6,12 +6,36 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") endif() # Select between SNAPPY and GZIP decompression @@ -68,11 +92,11 @@ if(IGNORE_DEFAULT_SEED) else() if (NOT DEFINED SEED) # the default seed for each FPGA type - if(FPGA_DEVICE MATCHES ".*a10.*") + if(DEVICE_FLAG MATCHES "A10") set(SEED 1) - elseif(FPGA_DEVICE MATCHES ".*s10.*") + elseif(DEVICE_FLAG MATCHES "S10") set(SEED 2) - elseif(FPGA_DEVICE MATCHES ".*agilex.*") + elseif(DEVICE_FLAG MATCHES "Agilex") set(SEED 3) else() message(STATUS "SEED not defined and no known seed for this board -- defaulting to SEED = 1") @@ -94,13 +118,13 @@ endif() # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_SIMULATOR") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_HARDWARE") -set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") -set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_SIMULATOR ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${CONSTEXPR_STEPS} ${WIN_FLAG} ${AC_TYPES_FLAG} ${LITERALS_PER_CYCLE_FLAG} ${DECOMPRESS_FORMAT_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}") +set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} ${FLAT_COMPILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp index 93b1e9daeb..e379258e89 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/common/common.hpp @@ -321,7 +321,17 @@ sycl::event SubmitProducer(sycl::queue& q, unsigned in_count_padded, // GZIP and SNAPPY designs, we guarantee this in the DecompressBytes // functions in ../gzip/gzip_decompressor.hpp and // ../snappy/snappy_decompressor.hpp respectively. +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. sycl::device_ptr in(in_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + unsigned char* in(in_ptr); +#endif fpga_tools::MemoryToPipe( in, iteration_count); }); @@ -355,7 +365,19 @@ sycl::event SubmitConsumer(sycl::queue& q, unsigned out_count_padded, // elements at once from 'OutPipe' and write them to 'out_ptr'. // For details about the 'false' template parameter, see the SubmitProducer // function above. + +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. sycl::device_ptr out(out_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + unsigned char* out(out_ptr); +#endif + fpga_tools::PipeToMemory( out, iteration_count); diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp index 80b15d04c4..d042458173 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/gzip/gzip_metadata_reader.hpp @@ -284,9 +284,22 @@ sycl::event SubmitGzipMetadataReader(sycl::queue& q, int in_count, GzipHeaderData* hdr_data_ptr, int* crc_ptr, int* out_count_ptr) { return q.single_task([=]() [[intel::kernel_args_restrict]] { + +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. sycl::device_ptr hdr_data(hdr_data_ptr); sycl::device_ptr crc(crc_ptr); sycl::device_ptr out_count(out_count_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + GzipHeaderData* hdr_data(hdr_data_ptr); + int* crc(crc_ptr); + int* out_count(out_count_ptr); +#endif // local copies of the output data GzipHeaderData hdr_data_loc; diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp index ed93d66b80..e41a9af598 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/decompress/src/snappy/snappy_reader.hpp @@ -385,7 +385,17 @@ template ([=] { +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. sycl::device_ptr preamble_count(preamble_count_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + unsigned* preamble_count(preamble_count_ptr); +#endif *preamble_count = SnappyReader(in_count); }); diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md index ae930552aa..ce1e0c4442 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/README.md @@ -39,7 +39,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -50,18 +50,20 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details The GZIP DEFLATE algorithm uses a GZIP-compatible Limpel-Ziv 77 (LZ77) algorithm for data de-duplication and a GZIP-compatible Static Huffman algorithm for bit reduction. The implementation includes three FPGA accelerated tasks (LZ77, Static Huffman, and CRC). -The FPGA implementation of the algorithm enables either one or two independent GZIP compute engines to operate in parallel on the FPGA. The available FPGA resources constrain the number of engines. By default, the design is parameterized to create a single engine when the design is compiled to target Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA). Two engines are created when compiling for Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX), which is a larger device. +The FPGA implementation of the algorithm enables either one or two independent GZIP compute engines to operate in parallel on the FPGA. The available FPGA resources constrain the number of engines. By default, the design is parameterized to create a single engine when the design is compiled to target an Intel® Arria® 10 FPGA. Two engines are created when compiling for Intel® Stratix® 10 or Agilex™ FPGAs, which are a larger device. This reference design contains two variants: "High Bandwidth" and "Low-Latency." - The High Bandwidth variant maximizes system throughput without regard for latency. It transfers input/output SYCL Buffers to FPGA-attached DDR. The kernel then operates on these buffers. - The Low-Latency variant takes advantage of Universal Shared Memory (USM) to avoid these copy operations, allowing the GZIP engine to access input/output buffers in host-memory directly. This reduces latency, but throughput is also reduced. "Latency" in this context is defined as the duration of time between when the input buffer is available in host memory to when the output buffer (i.e., the compressed result) is available in host memory. -The Low-Latency variant is only supported on Intel Stratix® 10 SX. +The Low-Latency variant is only supported on USM capable BSPs, or when targeting an FPGA family/part number. | Kernel | Description |:--- |:--- @@ -99,14 +101,14 @@ To optimize performance, GZIP leverages techniques discussed in the following FP | `-Xshardware` | Targets FPGA hardware (instead of FPGA emulator). | `-Xsparallel=2` | Uses two cores when compiling the bitstream through Intel® Quartus®. | `-Xsseed=` | Uses a particular seed while running Intel® Quartus®, selected to yield the best Fmax for this design. -| `-Xsnum-reorder=6` | On Intel Stratix® 10 SX only, specify a wider data path for read data from global memory. +| `-Xsnum-reorder=6` | On FPGA boards that have a large memory bandwidth, specify a wider data path for read data from global memory. | `-Xsopt-arg="-nocaching"` | Specifies that cached LSUs should not be used. Additionaly, the cmake build system can be configured using the following parameter: | cmake option | Description |:--- |:--- -| `-DNUM_ENGINES=<1\|2>` | Specifies that 1 GZIP engine should be compiled when targeting Intel Arria® 10 GX and two engines when targeting Intel Stratix® 10 SX. +| `-DNUM_ENGINES=<1\|2>` | Specifies that the number of GZIP engine that should be compiled. ### Performance @@ -114,9 +116,9 @@ Performance results are based on testing as of October 27, 2020. > **Note**: Refer to the [Performance Disclaimers](/DirectProgramming/C++SYCL_FPGA/README.md#performance-disclaimers) section for important performance information. -| Device | Throughput -|:--- |:--- -| Intel® PAC with Intel® Arria® 10 GX FPGA | 1 engine @ 3.4 GB/s +| Device | Throughput +|:--- |:--- +| Intel® PAC with Intel® Arria® 10 GX FPGA | 1 engine @ 3.4 GB/s | Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) | 2 engines @ 4.5 GB/s each = 9.0 GB/s total (High Bandwidth variant) using 120MB+ input
2 engines @ 3.5 GB/s = 7.0 GB/s (Low Latency variant) using 80 KB input ## Build the `GZIP` Design @@ -140,20 +142,28 @@ Performance results are based on testing as of October 27, 2020. ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For the **low latency** version of the design, add `-DLOW_LATENCY=1`. - ``` - cmake .. -DLOW_LATENCY=1 -DFPGA_DEVICE=intel_s10sx_pac:pac_s10_usm - ``` + + For the **low latency** version of the design, add `-DLOW_LATENCY=1` to your `cmake` command. + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -175,27 +185,30 @@ Performance results are based on testing as of October 27, 2020. ``` make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/gzip.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/gzip.fpga.tar.gz). ### On Windows* -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For the **low latency** version of the design, add `-DLOW_LATENCY=1`. - ``` - cmake -G "Nmake Makefiles" .. -DLOW_LATENCY=1 -DFPGA_DEVICE=intel_s10sx_pac:pac_s10_usm - ``` + + For the **low latency** version of the design, add `-DLOW_LATENCY=1` to your `cmake` command. + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -227,7 +240,7 @@ Performance results are based on testing as of October 27, 2020. | Argument | Description |:--- |:--- | `` | Specifies the file to be compressed.
Use an 120+ MB file to achieve peak performance.
Use an 80 KB file for Low Latency variant. -| `-o=` | Specifies the name of the output file. The default name of the output file is `.gz`.
When targeting Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), the single `` is fed to both engines, yielding two identical output files, using `` as the basis for the filenames. +| `-o=` | Specifies the name of the output file. The default name of the output file is `.gz`.
When using two engines, the single `` is fed to both engines, yielding two identical output files, using `` as the basis for the filenames. ### On Linux @@ -241,7 +254,7 @@ Performance results are based on testing as of October 27, 2020. CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./gzip.fpga_sim -o= ``` - 3. Run the sample on the FPGA device. + 3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` aocl initialize acl0 pac_s10_usm ./gzip.fpga -o= @@ -258,7 +271,7 @@ Performance results are based on testing as of October 27, 2020. gzip.fpga_sim.exe -o= set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` - 3. Run the sample on the FPGA device. + 3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` aocl initialize acl0 pac_s10_usm gzip.fpga.exe -o= diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt index 56b9aabe00..133c8c1de0 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/gzip/src/CMakeLists.txt @@ -21,12 +21,37 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + + set(IS_BSP "0") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + else() + set(IS_BSP "0") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.") + message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.") + endif() +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") endif() # This is a Windows-specific flag that enables error handling in host code @@ -35,7 +60,7 @@ if(WIN32) endif() # Set design parameters according to the selected chip -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters set(NUM_ENGINES 1) if(DEFINED LOW_LATENCY) @@ -45,7 +70,7 @@ if(FPGA_DEVICE MATCHES ".*a10.*") set(SEED "-Xsseed=4") set(NUM_REORDER "") endif() -elseif(FPGA_DEVICE MATCHES ".*s10.*") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(NUM_ENGINES 2) if(DEFINED LOW_LATENCY) @@ -57,7 +82,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*") # For Low Latency variant this is not necessary since only one channel of global memory is used (host memory). set(NUM_REORDER "-Xsnum-reorder=6") endif() -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ set(NUM_ENGINES 2) if(DEFINED LOW_LATENCY) @@ -79,11 +104,10 @@ if(IGNORE_DEFAULT_SEED) set(SEED "") endif() - # Presence of USM host allocations (and whether to turn on enable the low-latency target) is detected automatically by # looking at the name of the BSP, or manually by the user when running CMake. # E.g., cmake .. -DUSM_HOST_ALLOCATIONS_ENABLED=1 -if(LOW_LATENCY AND NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) +if((IS_BSP STREQUAL "1") AND LOW_LATENCY AND NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) # Low latency design requires USM, so error out message(FATAL_ERROR "Error: The Low Latency variant of the design requires USM host allocations") endif() diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md index 3dc9f2ef3f..186e1f0bbd 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/README.md @@ -41,7 +41,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -52,6 +52,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -116,16 +118,26 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -147,8 +159,6 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/merge_sort.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/merge_sort.fpga.tar.gz). - ## Run the `Merge Sort` Program ### On Linux @@ -157,11 +167,11 @@ For `constexpr_math.hpp`, `pipe_utils.hpp`, and `unrolled_loop.hpp` see the READ ``` ./merge_sort.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./merge_sort.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./merge_sort.fpga ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt index 917d1e16c9..bf03017f1e 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/CMakeLists.txt @@ -6,12 +6,26 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + + set(IS_BSP "0") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + set(BSP_FLAG "-DIS_BSP") + else() + set(IS_BSP "0") + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code and USM will be enabled by default.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code and USM checks are performed.") + endif() endif() # This is a Windows-specific flag that enables error handling in host code @@ -21,7 +35,7 @@ endif() # check if the BSP has USM host allocations or manually enable using host allocations # e.g. cmake .. -DUSE_USM_HOST_ALLOCATIONS=1 -if(FPGA_DEVICE MATCHES ".*usm.*" OR DEFINED USE_USM_HOST_ALLOCATIONS) +if((IS_BSP STREQUAL "0") OR FPGA_DEVICE MATCHES ".*usm.*" OR DEFINED USE_USM_HOST_ALLOCATIONS) set(ENABLE_USM "-DUSM_HOST_ALLOCATIONS") message(STATUS "USM host allocations are enabled") endif() @@ -66,12 +80,12 @@ endif() # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PROFILE_FLAG} -Xsparallel=2 ${SEED_FLAG} -Xstarget=${FPGA_DEVICE} ${ENABLE_USM} ${MERGE_UNITS_FLAG} ${SORT_WIDTH_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp index ccaaf788b6..7d74bae1b5 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/consume.hpp @@ -22,7 +22,13 @@ event Consume(queue& q, ValueT* out_ptr, IndexT total_count, IndexT offset, // Creating a device_ptr tells the compiler that this pointer is in // device memory, not host memory, and avoids creating extra connections // to host memory + // This is only done in the case where we target a BSP as device + // pointers are not supported when targeting an FPGA family/part +#if defined(IS_BSP) device_ptr out(out_ptr); +#else + ValueT* out(out_ptr); +#endif for (IndexT i = 0; i < iterations; i++) { // get the data from the pipe diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp index 68b945fa98..c5cc08b4fa 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/produce.hpp @@ -24,7 +24,13 @@ event Produce(queue& q, ValueT *in_ptr, IndexT count, IndexT in_block_count, // Creating a device_ptr tells the compiler that this pointer is in // device memory, not host memory, and avoids creating extra connections // to host memory + // This is only done in the case where we target a BSP as device + // pointers are not supported when targeting an FPGA family/part +#if defined(IS_BSP) device_ptr in(in_ptr); +#else + ValueT* in(in_ptr); +#endif for (IndexT i = 0; i < iterations; i++) { // read 'k_width' elements from device memory diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp index 46a7a3d4b8..487bac5a9f 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/merge_sort/src/sorting_networks.hpp @@ -104,7 +104,17 @@ event SortNetworkKernel(queue& q, ValueT* out_ptr, IndexT total_count, const IndexT iterations = total_count / k_width; return q.single_task([=]() [[intel::kernel_args_restrict]] { + // Creating a device_ptr tells the compiler that this pointer is in + // device memory, not host memory, and avoids creating extra connections + // to host memory + // This is only done in the case where we target a BSP as device + // pointers are not supported when targeting an FPGA family/part +#if defined(IS_BSP) device_ptr out(out_ptr); +#else + ValueT* out(out_ptr); +#endif + for (IndexT i = 0; i < iterations; i++) { // read the input data from the pipe sycl::vec data = InPipe::read(); diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md index fabb9cf5c3..669bb3cdd4 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/README.md @@ -49,7 +49,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -60,6 +60,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -119,16 +121,26 @@ The `DataProducer` kernel replaces the input IO pipe in the first image. The spl ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -150,23 +162,28 @@ The `DataProducer` kernel replaces the input IO pipe in the first image. The spl make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/mvdr_beamforming.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/mvdr_beamforming.fpga.tar.gz). - ### On Windows* -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -208,11 +225,11 @@ The general syntax for running the program is shown below and the table describe ``` ./mvdr_beamforming.fpga_emu 1024 ../data . ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./mvdr_beamforming.fpga_sim 1024 ../data . ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./mvdr_beamforming.fpga 1024 ../data . ``` @@ -223,13 +240,13 @@ The general syntax for running the program is shown below and the table describe ``` mvdr_beamforming.fpga_emu.exe 1024 ../data . ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 mvdr_beamforming.fpga_sim.exe ../data . set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` mvdr_beamforming.fpga.exe 1024 ../data . ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt index 514fd4e447..198c9bd6a2 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/mvdr_beamforming/src/CMakeLists.txt @@ -6,16 +6,26 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(IS_BSP "0") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + else() + set(IS_BSP "0") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.") + message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.") + endif() endif() # check if the BSP has USM host allocations -if(FPGA_DEVICE MATCHES ".usm.*") +if((IS_BSP STREQUAL "0") OR FPGA_DEVICE MATCHES ".usm.*") set(ENABLE_USM "-DUSM_HOST_ALLOCATIONS") message(STATUS "USM host allocations are enabled") endif() @@ -90,7 +100,7 @@ set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -fbracket-depth set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} ${ENABLE_USM}") set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -DFPGA_SIMULATOR") set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Wall -fbracket-depth=512 ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${UDP_LINK_FLAGS} ${AC_TYPES_FLAG} -Xssimulation -Xsghdl") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -FPGA_HARDWARE") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${WIN_FLAG} -fbracket-depth=512 ${AC_TYPES_FLAG} ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} -DFPGA_HARDWARE") set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Wall -Xshardware -fbracket-depth=512 ${ENABLE_USM} ${SENSOR_SIZE_FLAG} ${NUM_SENSORS_FLAG} ${QRD_MIN_ITERATIONS_FLAG} ${REAL_IO_PIPES_FLAG} ${STREAMING_PIPE_WIDTH_FLAG} ${PROFILE_FLAG} -Xsparallel=2 -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${UDP_LINK_FLAGS}") set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${AC_TYPES_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md index 28b11d95b4..c5b72cbd2d 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/README.md @@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ### Performance @@ -76,7 +78,7 @@ The design uses the `-fp-relaxed` option, which permits the compiler to reorder With this optimization, our FPGA implementation requires 4*m* DSPs to compute the complex floating point dot product or 2*m* DSPs for the real case. The matrix size is constrained by the total FPGA DSP resources available. -By default, the design is parameterized to process 128 × 128 matrices when compiled targeting Intel® PAC with Intel Arria® 10 GX FPGA. It is parameterized to process 256 × 256 matrices when compiled targeting Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), a larger device; however, the design can process matrices from 4 x 4 to 512 x 512. +By default, the design is parameterized to process 128 × 128 matrices when compiled targeting an Intel® Arria® 10 FPGA. It is parameterized to process 256 × 256 matrices when compiled targeting a Intel® Stratix® 10 or Intel® Agilex™ FPGA; however, the design can process matrices from 4 x 4 to 512 x 512. To optimize the performance-critical loop in its algorithm, the design leverages concepts discussed in the following FPGA tutorials: @@ -135,17 +137,26 @@ Additionaly, the cmake build system can be configured using the following parame ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -167,23 +178,27 @@ Additionaly, the cmake build system can be configured using the following parame make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/qrd.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/qrd.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -240,7 +255,7 @@ You can perform the QR decomposition of the set of matrices repeatedly. This ste #### Run on FPGA -1. Run the sample on the FPGA device. +1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./qrd.fpga ``` @@ -267,7 +282,7 @@ You can perform the QR decomposition of the set of matrices repeatedly. This ste #### Run on FPGA -1. Run the sample on the FPGA device. +1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` qrd.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt index b909ab5663..202579b6a9 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/CMakeLists.txt @@ -7,12 +7,36 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() +endif() + +if(NOT DEFINED DEVICE_FLAG) + message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ + Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex") endif() # This is a Windows-specific flag that enables error handling in host code @@ -31,38 +55,35 @@ else() endif() -# A10 parameters -set(ROWS_COMPONENT 128) -set(COLS_COMPONENT 128) -set(COMPLEX 1) -set(FIXED_ITERATIONS 64) -set(CLOCK_TARGET 360MHz) -set(SEED "-Xsseed=7") -# Overwrite design parameters according to the selected board -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters - # Nothing to do -elseif(FPGA_DEVICE MATCHES ".*s10.*") + set(ROWS_COMPONENT 128) + set(COLS_COMPONENT 128) + set(COMPLEX 1) + set(FIXED_ITERATIONS 64) + set(CLOCK_TARGET "-Xsclock=360MHz") + set(SEED "-Xsseed=7") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(ROWS_COMPONENT 256) set(COLS_COMPONENT 256) set(COMPLEX 1) set(FIXED_ITERATIONS 110) - set(CLOCK_TARGET 480MHz) + set(CLOCK_TARGET "-Xsclock=480MHz") set(SEED "-Xsseed=9") -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ parameters set(ROWS_COMPONENT 256) set(COLS_COMPONENT 256) set(FIXED_ITERATIONS 110) set(COMPLEX 1) - set(CLOCK_TARGET 600MHz) + set(CLOCK_TARGET "-Xsclock=600MHz") set(SEED "-Xsseed=5") else() - message(STATUS "Unknown board ${FPGA_DEVICE}!") - message(STATUS "Using Arria 10 defaults.") + message(FATAL_ERROR "Unreachable") endif() + if(IGNORE_DEFAULT_SEED) set(SEED "") endif() @@ -93,13 +114,13 @@ message(STATUS "SEED=${SEED}") # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} ${AC_TYPES_LINK_FLAG}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} ${USER_HARDWARE_FLAGS}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} ${AC_TYPES_LINK_FLAG}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_HARDWARE") -set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_LINK_FLAG}") -set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${STACK_FLAG}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${STACK_FLAG} -Xssimulation -Xsghdl ${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} ${AC_TYPES_COMPILE_FLAG} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS=${FIXED_ITERATIONS} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_HARDWARE ${BSP_FLAG}") +set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${PLATFORM_SPECIFIC_LINK_FLAGS} ${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${AC_TYPES_LINK_FLAG} ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "${REPORT_LINK_FLAGS} ${STACK_FLAG} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp index 0e03ed62a5..62f575b87f 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/memory_transfers.hpp @@ -37,7 +37,17 @@ void MatrixReadFromDDRToPipe( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif // Repeatedly read matrix_count matrices from DDR and sends them to the pipe for (int repetition = 0; repetition < repetitions; repetition++){ @@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe( // Only perform the DDR reads that are relevant (and don't access a // memory address that may be beyond the matrix last address) if (!out_of_bounds) { - ddr_read.template get() = matrix_ptr_device + ddr_read.template get() = matrix_ptr_located [matrix_index * kMatrixSize + load_index + k]; } } else{ - ddr_read.template get() = matrix_ptr_device + ddr_read.template get() = matrix_ptr_located [matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k]; } @@ -128,7 +138,18 @@ void MatrixReadPipeToDDR( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif + // Repeatedly read matrix_count matrices from the pipe and write them to DDR for (int repetition = 0; repetition < repetitions; repetition++){ @@ -161,12 +182,12 @@ void MatrixReadPipeToDDR( // Only perform the DDR writes that are relevant (and don't access a // memory address that may be beyond the buffer last address) if (!out_of_bounds) { - matrix_ptr_device[matrix_index * kMatrixSize + write_idx + k] = + matrix_ptr_located[matrix_index * kMatrixSize + write_idx + k] = pipe_read.template get(); } } else{ - matrix_ptr_device[matrix_index * kMatrixSize + matrix_ptr_located[matrix_index * kMatrixSize + int(li) * num_elem_per_bank + k] = pipe_read.template get(); } diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp index c86e5f4e95..79dceffa08 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qrd/src/qrd.hpp @@ -61,9 +61,16 @@ void QRDecompositionImpl( kNumElementsPerDDRBurst * 4>; // Allocate FPGA DDR memory. +#if defined (IS_BSP) TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q); TT *q_device = sycl::malloc_device(kQMatrixSize * matrix_count, q); TT *r_device = sycl::malloc_device(kRMatrixSize * matrix_count, q); +#else + // malloc_device are not supported when targetting an FPGA part/family + TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q); + TT *q_device = sycl::malloc_shared(kQMatrixSize * matrix_count, q); + TT *r_device = sycl::malloc_shared(kRMatrixSize * matrix_count, q); +#endif q.memcpy(a_device, a_matrix.data(), kAMatrixSize * matrix_count * sizeof(TT)).wait(); @@ -96,7 +103,18 @@ void QRDecompositionImpl( ]() [[intel::kernel_args_restrict]] { // Read the R matrix from the RMatrixPipe pipe and copy it to the // FPGA DDR - sycl::device_ptr vector_ptr_device(r_device); + +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr vector_ptr_located(r_device); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* vector_ptr_located(r_device); +#endif // Repeat matrix_count complete R matrix pipe reads // for as many repetitions as needed @@ -106,7 +124,7 @@ void QRDecompositionImpl( [[intel::loop_coalesce(2)]] // NO-FORMAT: Attribute for (int matrix_index = 0; matrix_index < matrix_count; matrix_index++) { for (int r_idx = 0; r_idx < kRMatrixSize; r_idx++) { - vector_ptr_device[matrix_index * kRMatrixSize + r_idx] = + vector_ptr_located[matrix_index * kRMatrixSize + r_idx] = RMatrixPipe::read(); } // end of r_idx } // end of repetition_index diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md index 9b6576f876..8073adcf12 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/README.md @@ -44,7 +44,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA (Intel® PAC with Intel® Arria® 10 GX FPGA)
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -55,6 +55,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -127,17 +129,25 @@ Additionaly, the cmake build system can be configured using the following parame ### On Linux* 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10 - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -159,23 +169,27 @@ Additionaly, the cmake build system can be configured using the following parame make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/qri.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/qri.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Configure the build system for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Configure the build system for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_BOARD=intel_s10sx_pac:pac_s10 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -233,7 +247,7 @@ You can perform the QR-based inversion of the set of matrices repeatedly, as sho #### Run on FPGA -1. Run the sample on the FPGA device. +1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./qri.fpga ``` @@ -260,7 +274,7 @@ You can perform the QR-based inversion of the set of matrices repeatedly, as sho #### Run on FPGA -1. Run the sample on the FPGA device. +1. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` qri.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt index 0e508ebf5c..2664b38759 100755 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/CMakeLists.txt @@ -7,12 +7,32 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") + endif() + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() endif() # This is a Windows-specific flag that enables error handling in host code @@ -24,19 +44,16 @@ else() set(PLATFORM_SPECIFIC_LINK_FLAGS "-fp-model=precise") endif() -# A10 parameters -set(ROWS_COMPONENT 32) -set(COLS_COMPONENT 32) -set(COMPLEX 0) -set(FIXED_ITERATIONS_QRD 50) -set(FIXED_ITERATIONS_QRI 36) -set(CLOCK_TARGET 360MHz) -set(SEED "-Xsseed=10") -# Overwrite design parameters according to the selected board -if(FPGA_DEVICE MATCHES ".*a10.*") +if(DEVICE_FLAG MATCHES "A10") # A10 parameters - # Nothing to do -elseif(FPGA_DEVICE MATCHES ".*s10.*") + set(ROWS_COMPONENT 32) + set(COLS_COMPONENT 32) + set(COMPLEX 0) + set(FIXED_ITERATIONS_QRD 50) + set(FIXED_ITERATIONS_QRI 36) + set(CLOCK_TARGET 360MHz) + set(SEED "-Xsseed=10") +elseif(DEVICE_FLAG MATCHES "S10") # S10 parameters set(ROWS_COMPONENT 32) set(COLS_COMPONENT 32) @@ -45,7 +62,7 @@ elseif(FPGA_DEVICE MATCHES ".*s10.*") set(FIXED_ITERATIONS_QRI 38) set(CLOCK_TARGET 450MHz) set(SEED "-Xsseed=5") -elseif(FPGA_DEVICE MATCHES ".*agilex.*") +elseif(DEVICE_FLAG MATCHES "Agilex") # Agilex™ parameters set(ROWS_COMPONENT 32) set(COLS_COMPONENT 32) @@ -55,8 +72,7 @@ elseif(FPGA_DEVICE MATCHES ".*agilex.*") set(CLOCK_TARGET 520MHz) set(SEED "-Xsseed=5") else() - message(STATUS "Unknown board ${FPGA_DEVICE}!") - message(STATUS "Using Arria 10 defaults.") + message(FATAL_ERROR "An incorrect DEVICE_FLAG was given. Make sure you have set -DDEVICE_FLAG=A10, -DDEVICE_FLAG=S10 or -DDEVICE_FLAG=Agilex.") endif() if(IGNORE_DEFAULT_SEED) @@ -94,12 +110,12 @@ message(STATUS "SEED=${SEED}") # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS}") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed ${USER_HARDWARE_FLAGS}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -DFPGA_SIMULATOR -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xssimulation -Xsghdl -Xsclock=${CLOCK_TARGET} -Xstarget=${FPGA_DEVICE} ${USER_SIMULATOR_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_COMPILE_FLAGS} -Wformat-security -Werror=format-security -fbracket-depth=512 -DFIXED_ITERATIONS_QRD=${FIXED_ITERATIONS_QRD} -DFIXED_ITERATIONS_QRI=${FIXED_ITERATIONS_QRI} -DCOMPLEX=${COMPLEX} -DROWS_COMPONENT=${ROWS_COMPONENT} -DCOLS_COMPONENT=${COLS_COMPONENT} -Xsfp-relaxed -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${PLATFORM_SPECIFIC_LINK_FLAGS} -Xshardware -Xsclock=${CLOCK_TARGET} -Xsparallel=2 ${SEED} -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} -Xsfp-relaxed ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp index 0e03ed62a5..7a57aca79a 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/memory_transfers.hpp @@ -37,7 +37,17 @@ void MatrixReadFromDDRToPipe( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif // Repeatedly read matrix_count matrices from DDR and sends them to the pipe for (int repetition = 0; repetition < repetitions; repetition++){ @@ -72,12 +82,12 @@ void MatrixReadFromDDRToPipe( // Only perform the DDR reads that are relevant (and don't access a // memory address that may be beyond the matrix last address) if (!out_of_bounds) { - ddr_read.template get() = matrix_ptr_device + ddr_read.template get() = matrix_ptr_located [matrix_index * kMatrixSize + load_index + k]; } } else{ - ddr_read.template get() = matrix_ptr_device + ddr_read.template get() = matrix_ptr_located [matrix_index * kMatrixSize + (int)(li)*num_elem_per_bank + k]; } @@ -128,7 +138,17 @@ void MatrixReadPipeToDDR( // Size of a full matrix constexpr int kMatrixSize = rows * columns; - sycl::device_ptr matrix_ptr_device(matrix_ptr); +#if defined (IS_BSP) + // When targeting a BSP, we instruct the compiler that this pointer + // lives on the device. + // Knowing this, the compiler won't generate hardware to + // potentially get data from the host. + sycl::device_ptr matrix_ptr_located(matrix_ptr); +#else + // Device pointers are not supported when targeting an FPGA + // family/part + TT* matrix_ptr_located(matrix_ptr); +#endif // Repeatedly read matrix_count matrices from the pipe and write them to DDR for (int repetition = 0; repetition < repetitions; repetition++){ @@ -161,12 +181,12 @@ void MatrixReadPipeToDDR( // Only perform the DDR writes that are relevant (and don't access a // memory address that may be beyond the buffer last address) if (!out_of_bounds) { - matrix_ptr_device[matrix_index * kMatrixSize + write_idx + k] = + matrix_ptr_located[matrix_index * kMatrixSize + write_idx + k] = pipe_read.template get(); } } else{ - matrix_ptr_device[matrix_index * kMatrixSize + matrix_ptr_located[matrix_index * kMatrixSize + int(li) * num_elem_per_bank + k] = pipe_read.template get(); } diff --git a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp index 85f4e55b43..ba0e24c4c9 100644 --- a/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp +++ b/DirectProgramming/C++SYCL_FPGA/ReferenceDesigns/qri/src/qri.hpp @@ -68,8 +68,15 @@ void QRIImpl( // Create buffers and allocate space for them. +#if defined (IS_BSP) TT *a_device = sycl::malloc_device(kAMatrixSize * matrix_count, q); TT *i_device = sycl::malloc_device(kInverseMatrixSize * matrix_count, q); +#else + // malloc_device are not supported when targetting an FPGA part/family + TT *a_device = sycl::malloc_shared(kAMatrixSize * matrix_count, q); + TT *i_device = sycl::malloc_shared(kInverseMatrixSize * matrix_count, q); +#endif + q.memcpy(a_device, a_matrix.data(), kAMatrixSize * matrix_count * sizeof(TT)).wait(); diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md index 10848c6f24..d41ec5943b 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/README.md @@ -38,7 +38,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -49,8 +49,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -85,22 +85,25 @@ Typically, these kernels are meant to run forever, and data is streamed to and f ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -125,23 +128,25 @@ Typically, these kernels are meant to run forever, and data is streamed to and f ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -177,7 +182,7 @@ Typically, these kernels are meant to run forever, and data is streamed to and f ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./autorun.fpga_sim ``` -3. Run on an FPGA device. +3. Run on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./autorun.fpga ``` @@ -198,7 +203,7 @@ Typically, these kernels are meant to run forever, and data is streamed to and f autorun.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run on an FPGA device. +3. Run on an FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` autorun.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/src/CMakeLists.txt index dbfb02daef..524c7c9bc7 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/autorun/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/README.md index 3b3d7277fa..1ea01ffe4a 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/README.md @@ -47,7 +47,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -58,10 +58,11 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +*Notice: SYCL USM host allocations, used in this tutorial, are only supported on FPGA boards that have a USM capable BSP (e.g. the Intel® FPGA PAC D5005 with Intel Stratix® 10 SX with USM support: intel_s10sx_pac:pac_s10_usm) or when targeting an FPGA family/part number. ->**Note**: SYCL* USM host allocations (and the code in this sample) are only supported for the **FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)** with USM support (for example, intel_s10sx_pac:pac_s10_usm). ## Key Implementation Details @@ -92,16 +93,25 @@ This sample demonstrates the following concepts: ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**. +2. Build the program for the Agilex™ device family, which is the default. + ``` mkdir build cd build cmake .. ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -124,23 +134,27 @@ This sample demonstrates the following concepts: make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/buffered_host_streaming.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/buffered_host_streaming.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) does not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -177,7 +191,7 @@ This sample demonstrates the following concepts: ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./buffered_host_streaming.fpga_sim ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./buffered_host_streaming.fpga ``` @@ -194,22 +208,13 @@ This sample demonstrates the following concepts: buffered_host_streaming.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` buffered_host_streaming.fpga.exe ``` ## Example Output -The following results were obtained on a system with the following specification. - -| Area | Description -|:--- |:--- -| CPU | Intel® Xeon® CPU E5-1650 v3 @ 3.50GHz (6 cores, 12 threads) -| CPU Memory | 65 Gb -| Accelerator | Intel® PAC D5005 (with Intel Stratix® 10 SX) -| PCIe | Gen 3.0 x16 - ### Example Output on an FPGA Emulator > **Note**: The FPGA emulator does not accurately represent the performance (throughput or latency) of the kernels. @@ -252,7 +257,7 @@ The following results were obtained on a system with the following specification PASSED ``` -### Example Output on an FPGA Device +### Example Output on an Intel® PAC D5005 (with Intel Stratix® 10 SX) >**Note**: In the performance results shown below the FPGA kernel is **not** the bottleneck of the full system. Instead, the **Producer**/**Consumer** running in parallel are the bottlenecks. (See the [Roofline Analysis](#roofline-analysis) section below for more information.) The full design achieves ~87% of the maximum throughput, as measured by the roofline analysis. diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/src/CMakeLists.txt index b6c8d1ac41..b3eda91118 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/buffered_host_streaming/src/CMakeLists.txt @@ -7,18 +7,28 @@ set(REPORTS_TARGET ${TARGET_NAME}_report) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_s10sx_pac:pac_s10_usm") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Stratix(R) 10 SX FPGA with USM support). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(IS_BSP "0") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + else() + set(IS_BSP "0") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.") + message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.") + endif() endif() # this tutorial requires USM host allocations. Check the BSP name (which should contain the text 'usm') # to ensure the BSP has the required support. Allow the user to define USM_HOST_ALLOCATIONS_ENABLED # to override this check (e.g., cmake .. -DUSM_HOST_ALLOCATIONS_ENABLED=1) -if(NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) +if((IS_BSP STREQUAL "1") AND (NOT FPGA_DEVICE MATCHES ".usm.*") AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) message(FATAL_ERROR "ERROR: This tutorial requires a BSP that has USM host allocations enabled.") endif() diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/README.md index 1eca0d48ca..5779126555 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/README.md @@ -41,7 +41,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
FPGA third-party/custom platforms with oneAPI support +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -52,8 +52,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -134,22 +134,26 @@ Each compute unit in the chain from `Source` to `Sink` must read from a unique p ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -170,29 +174,28 @@ Each compute unit in the chain from `Source` to `Sink` must read from a unique p ``` make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/compute_units.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/compute_units.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -225,11 +228,11 @@ Each compute unit in the chain from `Source` to `Sink` must read from a unique p ``` ./compute_units.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./compute_units.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./compute_units.fpga ``` @@ -239,13 +242,13 @@ Each compute unit in the chain from `Source` to `Sink` must read from a unique p ``` compute_units.fpga_emu.exe ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 compute_units.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` compute_units.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/src/CMakeLists.txt index dfb0ca6cf9..002416d343 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/compute_units/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/README.md index 8d992f4108..c101f40679 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/README.md @@ -42,7 +42,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
FPGA third-party/custom platforms with oneAPI support +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -53,8 +53,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -84,22 +84,25 @@ The key concepts discussed in this sample are as followed: ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -121,29 +124,28 @@ The key concepts discussed in this sample are as followed: ``` make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/double_buffering.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/double_buffering.fpga.tar.gz). - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -168,16 +170,6 @@ The key concepts discussed in this sample are as followed: >**Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your `build` directory in a shorter path, for example `C:\samples\build`. You can then build the sample in the new location, but you must specify the full path to the build files. -#### Troubleshooting - -If an error occurs, you can get more details by running `make` with -the `VERBOSE=1` argument: -``` -make VERBOSE=1 -``` -If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility. - - ## Run the `Double Buffering` Sample ### On Linux @@ -190,7 +182,7 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./double_buffering.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./double_buffering.fpga ``` @@ -207,7 +199,7 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic double_buffering.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` double_buffering.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/src/CMakeLists.txt index cd4b6e57de..f86ffe6ce0 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/double_buffering/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/README.md index 0d6ddbf035..78acd4a936 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/README.md @@ -38,8 +38,8 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
FPGA third-party/custom platforms with oneAPI support -| Software | Intel® oneAPI DPC++/C++ Compiler
Intel® FPGA Add-On for oneAPI Base Toolkit +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs +| Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. > @@ -49,8 +49,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -107,22 +107,25 @@ Alternatively, there is a hybrid approach that uses some implicit data movement ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -144,29 +147,27 @@ Alternatively, there is a hybrid approach that uses some implicit data movement ``` make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/explicit_data_movement.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/explicit_data_movement.fpga.tar.gz). - - ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -199,11 +200,11 @@ Alternatively, there is a hybrid approach that uses some implicit data movement ``` ./explicit_data_movement.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./explicit_data_movement.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./explicit_data_movement.fpga ``` @@ -214,13 +215,13 @@ Alternatively, there is a hybrid approach that uses some implicit data movement ``` explicit_data_movement.fpga_emu.exe ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 explicit_data_movement.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` explicit_data_movement.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/CMakeLists.txt index 83c3f1a58a..edc3cb61a3 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/CMakeLists.txt @@ -6,12 +6,22 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(BSP_FLAG "-DIS_BSP") + else() + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code.") + endif() endif() # This is a Windows-specific flag that enables exception handling in host code @@ -23,12 +33,12 @@ endif() # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_EMULATOR") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_EMULATOR ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/explicit_data_movement.cpp b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/explicit_data_movement.cpp index 18c2dafbe2..2e6fa9085a 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/explicit_data_movement.cpp +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/explicit_data_movement/src/explicit_data_movement.cpp @@ -36,8 +36,24 @@ double SubmitImplicitKernel(queue& q, std::vector& in, std::vector& out, // launch the computation kernel auto kernel_event = q.submit([&](handler& h) { + +#if defined (IS_BSP) accessor in_a(in_buf, h, read_only); accessor out_a(out_buf, h, write_only, no_init); +#else + // When targeting an FPGA family/part, the compiler does not know + // if the two kernels accesses the same memory location + // With this property, we tell the compiler that these buffers + // are in a location "1" whereas the pointers from ExplicitKernel + // are in the default location "0" + sycl::ext::oneapi::accessor_property_list location_of_buffer{ + ext::intel::buffer_location<1>}; + accessor in_a(in_buf, h, read_only, location_of_buffer); + + sycl::ext::oneapi::accessor_property_list location_of_buffer_no_init{ + no_init, ext::intel::buffer_location<1>}; + accessor out_a(out_buf, h, write_only, location_of_buffer_no_init); +#endif h.single_task([=]() [[intel::kernel_args_restrict]] { for (size_t i = 0; i < size; i ++) { @@ -68,9 +84,16 @@ double SubmitImplicitKernel(queue& q, std::vector& in, std::vector& out, template double SubmitExplicitKernel(queue& q, std::vector& in, std::vector& out, size_t size) { +#if defined (IS_BSP) // allocate the device memory T* in_ptr = malloc_device(size, q); T* out_ptr = malloc_device(size, q); +#else + // allocate the shared memory as device memory allocation is not supported + // when targeting an FPGA family/part + T* in_ptr = malloc_shared(size, q); + T* out_ptr = malloc_shared(size, q); +#endif // ensure we successfully allocated the device memory if(in_ptr == nullptr) { @@ -97,9 +120,16 @@ double SubmitExplicitKernel(queue& q, std::vector& in, h.single_task([=]() [[intel::kernel_args_restrict]] { // create device pointers to explicitly inform the compiler these // pointer reside in the device's address space +#if defined (IS_BSP) device_ptr in_ptr_d(in_ptr); device_ptr out_ptr_d(out_ptr); - +#else + // device pointers are not supported + // when targeting an FPGA family/part + T* in_ptr_d(in_ptr); + T* out_ptr_d(out_ptr); +#endif + for (size_t i = 0; i < size; i ++) { out_ptr_d[i] = in_ptr_d[i] * i; } diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/README.md index b74d5d1994..b0024dc6d4 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/README.md @@ -38,7 +38,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
FPGA third-party/custom platforms with oneAPI support +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -49,8 +49,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. - ->**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04*. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -136,22 +136,26 @@ Notice that the main kernel in the `SubmitSideChannelKernels` function in *src/S ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -172,28 +176,28 @@ Notice that the main kernel in the `SubmitSideChannelKernels` function in *src/S ``` make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/io_streaming.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/io_streaming.fpga.tar.gz). ### On Windows* ->**Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - To compile for the **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -226,11 +230,11 @@ Notice that the main kernel in the `SubmitSideChannelKernels` function in *src/S ``` ./io_streaming.fpga_emu ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./io_streaming.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./io_streaming.fpga ``` @@ -241,13 +245,13 @@ Notice that the main kernel in the `SubmitSideChannelKernels` function in *src/S ``` io_streaming.fpga_emu.exe ``` -2. Run the sample on the FPGA simulator device: +2. Run the sample on the FPGA simulator device. ``` set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 io_streaming.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` io_streaming.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/src/CMakeLists.txt index 2e88ff0ff1..53b28d2fe1 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/io_streaming/src/CMakeLists.txt @@ -6,12 +6,22 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(IS_BSP "0") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + else() + set(IS_BSP "0") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.") + message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.") + endif() endif() # This is a Windows-specific flag that enables error handling in host code @@ -19,8 +29,8 @@ if(WIN32) set(WIN_FLAG "/EHsc") endif() -# check if the BSP has USM host allocations -if(FPGA_DEVICE MATCHES ".usm.*") +# Use USM host allocations if the BSP supports them or if we target an FPGA part +if((IS_BSP STREQUAL "0") OR FPGA_DEVICE MATCHES ".usm.*") set(USM_HOST_ALLOCATIONS "-DUSM_HOST_ALLOCATIONS") message(STATUS "USM host allocations are enabled") endif() diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/README.md index 049ddc4b9a..23d74a4fda 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/README.md @@ -41,10 +41,10 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler -> **Note**: Even though the Intel® DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. +> **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. > > For using the simulator flow, Intel® Quartus® Prime Pro Edition and one of the following simulators must be installed and accessible through your PATH: > - Questa*-Intel® FPGA Edition @@ -52,6 +52,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -122,22 +124,25 @@ Look at the _Compiler Report > Throughput Analysis > Loop Analysis_ section in t > For more information on configuring environment variables, see [Use the setvars Script with Linux* or macOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -162,24 +167,27 @@ Look at the _Compiler Report > Throughput Analysis > Loop Analysis_ section in t make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/loop_carried_dependency.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/loop_carried_dependency.fpga.tar.gz). - - ### On Windows* ->**Note**: The Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) does not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -202,15 +210,6 @@ Look at the _Compiler Report > Throughput Analysis > Loop Analysis_ section in t >**Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your `build` directory in a shorter path, for example `C:\samples\build`. You can then build the sample in the new location, but you must specify the full path to the build files. -#### Troubleshooting - -If an error occurs, you can get more details by running `make` with -the `VERBOSE=1` argument: -``` -make VERBOSE=1 -``` -If you receive an error message, troubleshoot the problem using the **Diagnostics Utility for Intel® oneAPI Toolkits**. The diagnostic utility provides configuration and system checks to help find missing dependencies, permissions errors, and other issues. See the *[Diagnostics Utility for Intel® oneAPI Toolkits User Guide](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html)* for more information on using the utility. - ## Run the `Remove Loop Carried Dependency` Sample ### On Linux @@ -223,7 +222,7 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./loop_carried_dependency.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./loop_carried_dependency.fpga ``` @@ -240,7 +239,7 @@ If you receive an error message, troubleshoot the problem using the **Diagnostic loop_carried_dependency.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` loop_carried_dependency.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/src/CMakeLists.txt index 3d52bdaf17..89db5e6cd3 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/loop_carried_dependency/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/README.md index 7cf4383846..10e16da590 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/README.md @@ -39,7 +39,7 @@ You can also find more information about [troubleshooting build errors](/DirectP | Optimized for | Description |:--- |:--- | OS | Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX) +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler > **Note**: Even though the Intel DPC++/C++ OneAPI compiler is enough to compile for emulation, generating reports and generating RTL, there are extra software requirements for the simulation flow and FPGA compiles. @@ -50,6 +50,8 @@ You can also find more information about [troubleshooting build errors](/DirectP > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Key Implementation Details @@ -162,22 +164,26 @@ After each kernel is launched, the host-side operations (that occur *after* the ### On Linux* 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. + 3. Compile the design. (The provided targets match the recommended development flow.) 1. Compile for emulation (fast compile time, targets emulated FPGA device). @@ -201,30 +207,27 @@ After each kernel is launched, the host-side operations (that occur *after* the make fpga ``` - (Optional) The hardware compiles listed above can take several hours to complete; alternatively, you can download FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) from [https://iotdk.intel.com/fpga-precompiled-binaries/latest/n_way_buffering.fpga.tar.gz](https://iotdk.intel.com/fpga-precompiled-binaries/latest/n_way_buffering.fpga.tar.gz). - - ### On Windows* ->**Note**: The Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) does not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - 1. Change to the sample directory. -2. Build the program for **Intel® PAC with Intel Arria® 10 GX FPGA**, which is the default. +2. Build the program for the Agilex™ device family, which is the default. ``` mkdir build cd build cmake -G "NMake Makefiles" .. ``` - For **Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX)**, enter the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - - For a custom FPGA platform, ensure that the board support package is installed on your system then enter a command similar to the following: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 3. Compile the design. (The provided targets match the recommended development flow.) @@ -261,7 +264,7 @@ After each kernel is launched, the host-side operations (that occur *after* the ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./n_way_buffering.fpga_sim ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` ./n_way_buffering.fpga ``` @@ -278,7 +281,7 @@ After each kernel is launched, the host-side operations (that occur *after* the n_way_buffering.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device. +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`). ``` n_way_buffering.fpga.exe ``` diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/src/CMakeLists.txt index 73cb4c3657..aaf3e9e6fc 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/n_way_buffering/src/CMakeLists.txt @@ -6,13 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") - + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/README.md index 36e5700fe4..63e2a77810 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/README.md @@ -4,7 +4,7 @@ This FPGA tutorial demonstrates how to build a simple cache (implemented in FPGA | Optimized for | Description --- |--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How and when to implement the on-chip memory cache optimization | Time to complete | 30 minutes @@ -17,6 +17,8 @@ This FPGA tutorial demonstrates how to build a simple cache (implemented in FPGA > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -101,23 +103,26 @@ This tutorial creates multiple kernels sweeping across different cache depths wi ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -137,28 +142,29 @@ This tutorial creates multiple kernels sweeping across different cache depths wi ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -179,8 +185,6 @@ This tutorial creates multiple kernels sweeping across different cache depths wi nmake fpga ``` -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -208,7 +212,7 @@ Open the Kernel Memory viewer and compare the Load Latency on the loads from ker onchip_memory_cache.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./onchip_memory_cache.fpga (Linux) onchip_memory_cache.fpga.exe (Windows) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/src/CMakeLists.txt index eafb0596e4..9eef332a47 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/onchip_memory_cache/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/README.md index fc594bceeb..6c288f5253 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/README.md @@ -4,7 +4,7 @@ This FPGA tutorial discusses optimizing the throughput of an inner loop with a l | Optimized for | Description --- |--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
*__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How to optimize the throughput of an inner loop with a low trip. | Time to complete | 45 minutes @@ -17,6 +17,8 @@ This FPGA tutorial discusses optimizing the throughput of an inner loop with a l > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -140,23 +142,26 @@ while (Pipe::read()) { ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -176,28 +181,29 @@ while (Pipe::read()) { ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -218,7 +224,6 @@ while (Pipe::read()) { nmake fpga ``` -*Note:* The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support.
*Note:* If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -253,7 +258,7 @@ Version 2 of the kernel (`Producer<2>`) explicitly bounds the inner loop trip co loop_carried_dependency.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./optimize_inner_loop.fpga (Linux) optimize_inner_loop.fpga.exe (Windows) @@ -282,7 +287,7 @@ You should see the following output in the console: Kernel 2 throughput: 636.29 MB/s PASSED ``` - NOTE: These throughput numbers were collected using the Intel® PAC with Intel Arria® 10 GX FPGA. + NOTE: These throughput numbers were collected using the Intel® PAC with Intel Arria® 10 GX FPGA. ## License diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/src/CMakeLists.txt index 57737c59e6..3957592d27 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/optimize_inner_loop/src/CMakeLists.txt @@ -6,14 +6,15 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() + # This is a Windows-specific flag that enables exception handling in host code if(WIN32) set(WIN_FLAG "/EHsc") diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md index 0f9046d6f1..f6aa9ec8ec 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/README.md @@ -5,7 +5,7 @@ This FPGA tutorial showcases a design pattern that makes it possible to create a | Optimized for | Description --- |--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
*__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | A design pattern to generate an array of pipes using SYCL*
Static loop unrolling through template metaprogramming | Time to complete | 15 minutes @@ -18,6 +18,8 @@ This FPGA tutorial showcases a design pattern that makes it possible to create a > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -182,24 +184,26 @@ The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separat ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: - ``` - + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: * Compile for emulation (fast compile time, targets emulated FPGA device): @@ -218,28 +222,29 @@ The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separat ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -260,8 +265,6 @@ The host must thus enqueue the producer kernel and `kNumRows * kNumCols` separat nmake fpga ``` -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -287,7 +290,7 @@ You can visualize the kernels and pipes generated by looking at the "System View pipe_array.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./pipe_array.fpga (Linux) pipe_array.fpga.exe (Windows) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/src/CMakeLists.txt index 1616b37e26..5b8ad5a708 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/pipe_array/src/CMakeLists.txt @@ -6,14 +6,13 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() - # This is a Windows-specific flag that enables exception handling in host code if(WIN32) set(WIN_FLAG "/EHsc") diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/README.md index 26fad1b8f4..b12bbfadf2 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/README.md @@ -4,7 +4,7 @@ This tutorial describes the process of _Shannonization_ (named after [Claude Sha | Optimized for | Description |:--- |:--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
*__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How to make FPGA-specific optimizations to remove computation from the critical path and improve fMAX/II | Time to complete | 45 minutes @@ -17,6 +17,8 @@ This tutorial describes the process of _Shannonization_ (named after [Claude Sha > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -158,23 +160,26 @@ To achieve an II of 1 for the main `while` loop in the FPGA code shown above, th ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -194,10 +199,9 @@ To achieve an II of 1 for the main `while` loop in the FPGA code shown above, th ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. ## Examining the Reports -This section will walk through how the HTML reports show the result of the optimizations we made in each version of the kernel, the definition of which can be found in `src/IntersectionKernel.hpp`. Start by locating `report.html` in the `shannonization_report.prj/reports/` directory. Open the report in Chrome*, Firefox*, Edge*, or Internet Explorer*. The fMAX numbers mentioned in these sections assume that the Arria® 10 GX FPGA is the target. However, the discussion is similar for the Stratix® 10 SX FPGA. +This section will walk through how the HTML reports show the result of the optimizations we made in each version of the kernel, the definition of which can be found in `src/IntersectionKernel.hpp`. Start by locating `report.html` in the `shannonization_report.prj/reports/` directory. Open the report in Chrome*, Firefox*, Edge*, or Internet Explorer*. The fMAX numbers mentioned in these sections assume that the Arria® 10 FPGA is the target. However, the discussion is similar for the other targets. #### Version 0 The first version of the kernel, `Intersection<0>`, is the baseline implementation of the intersection kernel. Check the *Details* pane in the *Loop Analysis* tab for the `while` loop in the `Intersection<0>` kernel. You will notice that the *Block Scheduled fMAX* for the `Intersection<0>` kernel is far lower than the target (e.g., ~140 MHz). The *Details* pane shows that the most critical path contains the operations mentioned earlier at the end of the [Algorithm Details](#algorithm-details) Section. @@ -273,7 +277,7 @@ However, this places a 32-bit Integer Add Operation back into the critical path In general, these shannonization optimizations create a shift-register that precomputes and *passes* values (additions and comparisons) to the loop's later iterations. The size of the shift-register determines how many *future* iterations we precompute for. In version 1, we precompute for one iteration; in this version, we precompute for 2 iterations. The reports for the `Intersection<2>` should show a critical path with: a single 32-bit Integer Compare Operation (`a < b`), a 32-bit Select Operation (`::read`) and a 1-bit And Operation (`a < b && A_count_inrange`). Thus, we have removed two 32-bit Compare Operations and one 32-bit Add Operation from the critical path. Looking at the *Loop Analysis* pane, you will see that the *Block Scheduled fMAX* is highest for `Intersection<2>` (e.g., 240 MHz). #### Version 3 -As a consequence of the fabric architecture of the Intel Stratix® 10 SX FPGA, the hardware implementation of pipes for the Intel Stratix® 10 SX FPGA has a longer latency for blocking pipe reads and writes. In version 3 of the kernel, `Intersection<3>`, we transform the code to use non-blocking pipe reads. For the Intel® Arria® 10 GX FPGA, this does not have a noticeable difference. However, this transformation allows the design to reach an II of 1 for the Intel Stratix® 10 SX FPGA. +As a consequence of the fabric architecture of the Intel Stratix® 10 SX FPGA, the hardware implementation of pipes for the Intel Stratix® 10 SX FPGA has a longer latency for blocking pipe reads and writes. In version 3 of the kernel, `Intersection<3>`, we transform the code to use non-blocking pipe reads. For the Intel® Arria® 10 FPGA, this does not have a noticeable difference. However, this transformation allows the design to reach an II of 1 for the Intel Stratix® 10 and Intel Agilex™ FPGAs. ## Running the Sample @@ -286,7 +290,7 @@ As a consequence of the fabric architecture of the Intel Stratix® 10 SX FPGA ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./shannonization.fpga_sim ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./shannonization.fpga (Linux) ``` @@ -327,7 +331,7 @@ You should see the following output in the console: Kernel 2 average throughput: 742.257 MB/s PASSED ``` -> **Note**: These throughput numbers were collected using the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX). +> **Note**: These throughput numbers were collected using the Intel® FPGA PAC D5005 with Intel Stratix® 10 SX. ## License diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/src/CMakeLists.txt index edbf970b18..27d9b253c1 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/shannonization/src/CMakeLists.txt @@ -7,25 +7,27 @@ set(REPORTS_TARGET ${TARGET_NAME}_report) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") - set(DEVICE_FLAG "-DA10") + set(FPGA_DEVICE "Agilex") + set(DEVICE_FLAG "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - if(FPGA_DEVICE MATCHES ".*a10.*") - set(DEVICE_FLAG "-DA10") - elseif(FPGA_DEVICE MATCHES ".*s10.*") - set(DEVICE_FLAG "-DS10") - elseif(FPGA_DEVICE MATCHES ".*agilex.*") - set(DEVICE_FLAG "-DAgilex") + string(TOLOWER ${FPGA_DEVICE} FPGA_DEVICE_NAME) + if(FPGA_DEVICE_NAME MATCHES ".*a10.*" OR FPGA_DEVICE_NAME MATCHES ".*arria10.*") + set(DEVICE_FLAG "A10") + elseif(FPGA_DEVICE_NAME MATCHES ".*s10.*" OR FPGA_DEVICE_NAME MATCHES ".*stratix10.*") + set(DEVICE_FLAG "S10") + elseif(FPGA_DEVICE_NAME MATCHES ".*agilex.*") + set(DEVICE_FLAG "Agilex") endif() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() if(NOT DEFINED DEVICE_FLAG) message(FATAL_ERROR "An unrecognized or custom board was passed, but DEVICE_FLAG was not specified. \ - Please make sure you have set -DDEVICE_FLAG=-DA10 or -DDEVICE_FLAG=-DS10.") + Please make sure you have set -DDEVICE_FLAG=-DA10, -DDEVICE_FLAG=-DS10 or \ + -DDEVICE_FLAG=-DAgilex.") endif() # This is a Windows-specific flag that enables exception handling in host code @@ -33,22 +35,22 @@ if(WIN32) set(WIN_FLAG "/EHsc") endif() +# Allow disabling of hyper-optimization for S10 and Agilex +if((DEVICE_FLAG MATCHES "S10") OR (DEVICE_FLAG MATCHES "Agilex") OR (DEFINED ${NO_HYPER_OPTIMIZATION})) + set(HYPER_FLAG -Xshyper-optimized-handshaking=off) +endif() + # A SYCL ahead-of-time (AoT) compile processes the device code in two stages. # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_EMULATOR ${DEVICE_FLAG}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -DFPGA_EMULATOR -D${DEVICE_FLAG}") set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR ${DEVICE_FLAG}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} ${DEVICE_FLAG} -DFPGA_HARDWARE") -if(FPGA_DEVICE MATCHES ".s10.*") - # hyper-optimized-handshaking only applies to Intel Stratix® 10 FPGAs - set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") - set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") -else() - set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") - set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") -endif() +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -DFPGA_SIMULATOR -D${DEVICE_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xsghdl ${HYPER_FLAG} -Xstarget=${FPGA_DEVICE} -D${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -D${DEVICE_FLAG} -DFPGA_HARDWARE") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware ${HYPER_FLAG} -Xstarget=${FPGA_DEVICE} -D${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") + # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/README.md index 8decb023de..6acedbdece 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/README.md @@ -5,7 +5,7 @@ This tutorial demonstrates how to use SYCL* Universal Shared Memory (USM) to str | Optimized for | Description --- |--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support (and SYCL USM support)
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How to achieve low-latency host-device streaming while maintaining throughput | Time to complete | 45 minutes @@ -18,8 +18,11 @@ This tutorial demonstrates how to use SYCL* Universal Shared Memory (USM) to str > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. + +*Notice: SYCL USM host allocations, used in this tutorial, are only supported on FPGA boards that have a USM capable BSP (e.g. the Intel® FPGA PAC D5005 with Intel Stratix® 10 SX with USM support: intel_s10sx_pac:pac_s10_usm) or when targeting an FPGA family/part number. -> **Notice**: SYCL USM host allocations (and therefore this tutorial) are only supported for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) with USM support (i.e., intel_s10sx_pac:pac_s10_usm)* > **Notice**: This tutorial demonstrates an implementation of host streaming that will be supplanted by better techniques in a future release. See the [Drawbacks and Future Work](#drawbacks-and-future-work)* @@ -134,18 +137,26 @@ We are currently working on an API and tutorial to address both of these drawbac ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - ``` - cmake .. - ``` - You can also compile for a custom FPGA platform with SYCL USM support. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -165,22 +176,28 @@ We are currently working on an API and tutorial to address both of these drawbac ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, an Intel® PAC with Intel Stratix® 10 SX FPGA precompiled binary can be downloaded here. ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - You can also compile for a custom FPGA platform with SYCL USM support. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -201,8 +218,6 @@ We are currently working on an API and tutorial to address both of these drawbac nmake fpga ``` -> **Note**: The Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) does not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -226,7 +241,7 @@ Locate `report.html` in the `simple_host_streaming_report.prj/reports/` director simple_host_streaming.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./simple_host_streaming.fpga (Linux) simple_host_streaming.fpga.exe (Windows) @@ -258,7 +273,7 @@ You should see the following output in the console: ``` > **Note**: The FPGA emulator does not accurately represent the performance (throughput or latency) of the kernels. -2. When running on the FPGA device +2. When running on the Intel® FPGA PAC D5005 with Intel Stratix® 10 SX with USM support: ``` # Chunks: 512 Chunk count: 32768 diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/src/CMakeLists.txt index 457b6b5fa5..f149f057a8 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/simple_host_streaming/src/CMakeLists.txt @@ -7,18 +7,28 @@ set(REPORTS_TARGET ${TARGET_NAME}_report) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_s10sx_pac:pac_s10_usm") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Stratix(R) 10 SX FPGA with USM support). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(IS_BSP "0") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + else() + set(IS_BSP "0") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so USM will be enabled by default.") + message(STATUS "If the target is actually a BSP that does not support USM, run cmake with -DIS_BSP=1.") + endif() endif() # this tutorial requires USM host allocations. Check the BSP name (which should contain the text 'usm') # to ensure the BSP has the required support. Allow the user to define USM_HOST_ALLOCATIONS_ENABLED # to override this check (e.g., cmake .. -DUSM_HOST_ALLOCATIONS_ENABLED=1) -if(NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) +if((IS_BSP STREQUAL "1") AND NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) message(FATAL_ERROR "ERROR: This tutorial requires a BSP that has USM host allocations enabled.") endif() diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/README.md index 68c7652085..06dac37fb0 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/README.md @@ -6,7 +6,7 @@ This FPGA tutorial demonstrates an advanced technique to improve the performance | Optimized for | Description |:--- |:--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How and when to apply the triangular loop optimization technique | Time to complete | 30 minutes @@ -19,6 +19,8 @@ This FPGA tutorial demonstrates an advanced technique to improve the performance > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -209,24 +211,26 @@ Summing the number of real and dummy iterations gives the total iterations of th ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: * Compile for emulation (fast compile time, targets emulated FPGA device): @@ -245,28 +249,29 @@ Summing the number of real and dummy iterations gives the total iterations of th ``` make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -287,8 +292,6 @@ Summing the number of real and dummy iterations gives the total iterations of th nmake fpga ``` -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -314,7 +317,7 @@ Consult the "Loop Analysis" report to compare the optimized and unoptimized vers triangular_loop.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./triangular_loop.fpga (Linux) triangular_loop.fpga.exe (Windows) @@ -346,12 +349,12 @@ Throughput with optimization: 904.489876 MB/s ``` ### Discussion of Results -A test compile of this tutorial design achieved an fMAX of approximately 210 MHz on the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. The results with and without the optimization are shown in the following table: +A test compile of this tutorial design achieved an fMAX of approximately 210 MHz on the Intel® Programmable Acceleration Card with Intel® Arria® 10 GX FPGA. The results with and without the optimization are shown in the following table: -Configuration | Overall Execution Time (ms) | Throughput (MB/s) -|:---|:---|:--- -|Without optimization | 4972 | 25.7 -|With optimization | 161 | 796.6 +Configuration | Overall Execution Time (ms) | Throughput (MB/s) +|:--- |:--- |:--- +|Without optimization | 4972 | 25.7 +|With optimization | 161 | 796.6 Without optimization, the compiler achieved an II of 30 on the inner-loop. With the optimization, the compiler achieves an II of 1, and the throughput increased by approximately 30x. diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/src/CMakeLists.txt index 47cb4fb14f..98ea0fc38c 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/triangular_loop/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # This is a Windows-specific flag that enables exception handling in host code diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/README.md index 630cdcb4e4..6d38964c6b 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/README.md @@ -4,7 +4,7 @@ This tutorial demonstrates how to use zero-copy host memory via the SYCL Unified | Optimized for | Description |:--- |:--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support (and SYCL USM support)
*__Note__: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How to use SYCL USM host allocations for the FPGA | Time to complete | 15 minutes @@ -17,8 +17,11 @@ This tutorial demonstrates how to use zero-copy host memory via the SYCL Unified > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. + +*Notice: SYCL USM host allocations, used in this tutorial, are only supported on FPGA boards that have a USM capable BSP (e.g. the Intel® FPGA PAC D5005 with Intel Stratix® 10 SX with USM support: intel_s10sx_pac:pac_s10_usm) or when targeting an FPGA family/part number. -*Notice: SYCL USM host allocations (and therefore this tutorial) are only supported for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) with USM support (i.e., intel_s10sx_pac:pac_s10_usm)* ## Prerequisites @@ -83,19 +86,26 @@ This approach is not considered host streaming since the CPU and FPGA cannot (re ### On a Linux* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - - To compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - ``` - cmake .. - ``` - You can also compile for a custom FPGA platform with SYCL USM support. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -116,23 +126,28 @@ This approach is not considered host streaming since the CPU and FPGA cannot (re make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, an Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) precompiled binary (compatible with Linux* Ubuntu* 18.04) can be downloaded here. - ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - To compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. - ``` - You can also compile for a custom FPGA platform with SYCL USM support. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: -DUSM_HOST_ALLOCATIONS_ENABLED=1 - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -153,8 +168,6 @@ This approach is not considered host streaming since the CPU and FPGA cannot (re nmake fpga ``` -> **Note**: The Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) does not support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your `build` directory in a shorter path, for example `c:\samples\build`. You can then run cmake from that directory, and provide cmake with the full path to your sample directory. ## Examining the Reports @@ -167,7 +180,7 @@ Locate `report.html` in the `zero_copy_data_transfer_report.prj/reports/` direct ./zero_copy_data_transfer.fpga_emu (Linux) zero_copy_data_transfer.fpga_emu.exe (Windows) ``` -2. Run the sample on the FPGA simulator: +2. Run the sample on the FPGA simulator (the kernel executes on the CPU): * On Linux ``` CL_CONTEXT_MPSIM_DEVICE_INTELFPGA=1 ./zero_copy_data_transfer.fpga_sim @@ -178,7 +191,7 @@ Locate `report.html` in the `zero_copy_data_transfer_report.prj/reports/` direct zero_copy_data_transfer.fpga_sim.exe set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device: +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ``` ./zero_copy_data_transfer.fpga (Linux) zero_copy_data_transfer.fpga.exe (Windows) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/CMakeLists.txt index 55245b4cc6..a254b19412 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/CMakeLists.txt @@ -7,18 +7,31 @@ set(REPORTS_TARGET ${TARGET_NAME}_report) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_s10sx_pac:pac_s10_usm") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Stratix(R) 10 SX FPGA with USM support). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") + set(IS_BSP "0") + set(BSP_FLAG "") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") + + # Check if the target is a BSP + if(IS_BSP MATCHES "1" OR FPGA_DEVICE MATCHES ".*pac_a10.*|.*pac_s10.*") + set(IS_BSP "1") + set(BSP_FLAG "-DIS_BSP") + else() + set(IS_BSP "0") + set(BSP_FLAG "") + message(STATUS "The selected target ${FPGA_DEVICE} is assumed to be an FPGA part number, so the IS_BSP macro will not be passed to your C++ code and USM will be enabled by default.") + message(STATUS "If the target is actually a BSP, run cmake with -DIS_BSP=1 to pass the IS_BSP macro to your C++ code and USM checks are performed.") + endif() endif() # this tutorial requires USM host allocations. Check the BSP name (which should contain the text 'usm') # to ensure the BSP has the required support. Allow the user to define USM_HOST_ALLOCATIONS_ENABLED # to override this check (e.g., cmake .. -DUSM_HOST_ALLOCATIONS_ENABLED=1) -if(NOT FPGA_DEVICE MATCHES ".usm.*" AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) +if((IS_BSP STREQUAL "1") AND (NOT FPGA_DEVICE MATCHES ".usm.*") AND (NOT DEFINED USM_HOST_ALLOCATIONS_ENABLED OR USM_HOST_ALLOCATIONS_ENABLED STREQUAL "0")) message(FATAL_ERROR "ERROR: This tutorial requires a BSP that has USM host allocations enabled.") endif() @@ -35,12 +48,12 @@ endif() # 1. The "compile" stage compiles the device code to an intermediate representation (SPIR-V). # 2. The "link" stage invokes the compiler's FPGA backend before linking. # For this reason, FPGA backend flags must be passed as link flags in CMake. -set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Wall -DFPGA_EMULATOR ${DEVICE_FLAG}") -set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga") -set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -Wall -DFPGA_SIMULATOR ${DEVICE_FLAG}") -set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_SIMULATOR_FLAGS}") -set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Wall ${DEVICE_FLAG} -DFPGA_HARDWARE") -set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS}") +set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Wall -DFPGA_EMULATOR ${DEVICE_FLAG} ${BSP_FLAG}") +set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${BSP_FLAG}") +set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Xssimulation -Wall -DFPGA_SIMULATOR ${DEVICE_FLAG} ${BSP_FLAG}") +set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga -Xssimulation -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_SIMULATOR_FLAGS} ${BSP_FLAG}") +set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga -Wall ${WIN_FLAG} -Wall ${DEVICE_FLAG} -DFPGA_HARDWARE ${BSP_FLAG}") +set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xshyper-optimized-handshaking=off -Xstarget=${FPGA_DEVICE} ${DEVICE_FLAG} ${USER_HARDWARE_FLAGS} ${BSP_FLAG}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/buffer_kernel.hpp b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/buffer_kernel.hpp index 9b86facc61..b06c2b0886 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/buffer_kernel.hpp +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/DesignPatterns/zero_copy_data_transfer/src/buffer_kernel.hpp @@ -28,8 +28,25 @@ double SubmitBufferKernel(queue& q, std::vector& in, std::vector& out, // launch the computation kernel auto kernel_event = q.submit([&](handler& h) { + +#if defined (IS_BSP) accessor in_a(in_buf, h, read_only); accessor out_a(out_buf, h, write_only, no_init); +#else + // When targeting an FPGA family/part, the compiler does not know + // if the two kernels accesses the same memory location + // With this property, we tell the compiler that these buffers + // are in a location "1" whereas the pointers from ExplicitKernel + // are in the default location "0" + sycl::ext::oneapi::accessor_property_list location_of_buffer{ + ext::intel::buffer_location<1>}; + accessor in_a(in_buf, h, read_only, location_of_buffer); + + sycl::ext::oneapi::accessor_property_list location_of_buffer_no_init{ + no_init, ext::intel::buffer_location<1>}; + accessor out_a(out_buf, h, write_only, location_of_buffer_no_init); +#endif + h.single_task([=]() [[intel::kernel_args_restrict]] { for (size_t i = 0; i < size; i++) { diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/README.md index 6be533e935..a1b38fb033 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/README.md @@ -5,7 +5,7 @@ This FPGA tutorial demonstrates how to use the Algorithmic C (AC) data type `ac_ | Optimized for | Description |:--- |:--- | OS | CentOS*Linux 8
Red Hat* Enterprise Linux*8
SUSE* Linux Enterprise Server 15
Ubuntu*18.04 LTS
Ubuntu 20.04
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How different methods of `ac_fixed` number construction affect hardware resource utilization
Recommended method for constructing `ac_fixed` numbers in your kernel
Accessing and using the `ac_fixed` math library functions
Trading off accuracy of results for reduced resource usage on the FPGA | Time to complete | 30 minutes @@ -18,6 +18,8 @@ This FPGA tutorial demonstrates how to use the Algorithmic C (AC) data type `ac_ > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -162,28 +164,26 @@ When you use the `ac_fixed` library, keep the following points in mind: 1. Install the design in `build` directory from the design directory by running `cmake`: - ```bash - mkdir build - cd build - ``` - - If you are compiling for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - - ```bash - cmake .. - ``` - - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ```bash - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - - ```bash - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design using the generated `Makefile`. The following four build targets are provided that match the recommended development flow: @@ -211,34 +211,29 @@ When you use the `ac_fixed` library, keep the following points in mind: make fpga ``` -3. (Optional) As the earlier hardware compile can take several hours to complete, FPGA precompiled binaries (compatible with Ubuntu 18.04) can be downloaded [here](https://iotdk.intel.com/fpga-precompiled-binaries/latest/ac_fixed.fpga.tar.gz). - ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. - ``` - - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -266,8 +261,6 @@ When you use the `ac_fixed` library, keep the following points in mind: nmake fpga ``` -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA hardware on Windows* requires a third-party or custom Board Support Package (BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you might have to create your `build` directory in a shorter path, for example `c:\samples\build`. You can then run `cmake` from that directory, and provide `cmake` with the full path to your sample directory. ## Examining the Reports @@ -288,7 +281,7 @@ Scroll down on the Summary page of the report and expand the section titled **Co ac_fixed.fpga_emu.exe (Windows) ``` -2. Run the sample of the FPGA simulator device +2. Run the sample of the FPGA simulator device (the kernel executes on the CPU): * On Linux ```bash @@ -301,7 +294,7 @@ Scroll down on the Summary page of the report and expand the section titled **Co set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ```bash ./ac_fixed.fpga (Linux) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/src/CMakeLists.txt index 06249e736b..9d7c4f182a 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_fixed/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # These are Windows-specific flags: @@ -32,6 +32,8 @@ set(EMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -DFPGA_EMULATOR set(EMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG}") set(SIMULATOR_COMPILE_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -DFPGA_SIMULATOR -Wall ${WIN_FLAG}") set(SIMULATOR_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -Xssimulation -Xsghdl -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") +set(REPORT_COMPILE_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -Wall ${WIN_FLAG} -DFPGA_REPORT") +set(REPORT_LINK_FLAGS "-fsycl -fintelfpga -Xshardware -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") set(HARDWARE_COMPILE_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -Wall ${WIN_FLAG} -DFPGA_HARDWARE") set(HARDWARE_LINK_FLAGS "-fsycl -fintelfpga ${AC_TYPES_FLAG} -Xshardware -Xstarget=${FPGA_DEVICE} ${USER_HARDWARE_FLAGS}") # use cmake -D USER_HARDWARE_FLAGS= to set extra flags for FPGA backend compilation @@ -60,8 +62,8 @@ set(FPGA_EARLY_IMAGE ${TARGET_NAME}_report.a) add_executable(${FPGA_EARLY_IMAGE} ${SOURCE_FILE}) target_include_directories(${FPGA_EARLY_IMAGE} PRIVATE ../../../../include) add_custom_target(report DEPENDS ${FPGA_EARLY_IMAGE}) -set_target_properties(${FPGA_EARLY_IMAGE} PROPERTIES COMPILE_FLAGS "${HARDWARE_COMPILE_FLAGS}") -set_target_properties(${FPGA_EARLY_IMAGE} PROPERTIES LINK_FLAGS "${HARDWARE_LINK_FLAGS} -fsycl-link=early") +set_target_properties(${FPGA_EARLY_IMAGE} PROPERTIES COMPILE_FLAGS "${REPORT_COMPILE_FLAGS}") +set_target_properties(${FPGA_EARLY_IMAGE} PROPERTIES LINK_FLAGS "${REPORT_LINK_FLAGS} -fsycl-link=early") # fsycl-link=early stops the compiler after RTL generation, before invoking Quartus® ############################################################################### diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/README.md index e2d4fff73f..13f2b4c13b 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/README.md @@ -5,7 +5,7 @@ This FPGA tutorial demonstrates how to use the Algorithmic C (AC) data type `ac_ | Optimized for | Description |:--- |:--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | Using the `ac_int` data type for basic operations
Efficiently using the left shift operation
Setting and reading certain bits of an `ac_int` number | Time to complete | 20 minutes @@ -18,6 +18,8 @@ This FPGA tutorial demonstrates how to use the Algorithmic C (AC) data type `ac_ > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -48,7 +50,7 @@ This FPGA tutorial shows how to use the `ac_int` data type with some simple exam This data type can be used in place of native integer types to generate area efficient and optimized designs for the FPGA. When you have a computation that does not require the full dynamic range of a 32-bit integer, you should replace your `int` variables with `ac_int` variables of the correct, reduced width. For example, if you know that a loop will iterate from 0 to 12, only 4 bits are required. -Please refer to the [FPGA Optimization Guide for Intel® oneAPI Toolkits Developer Guide](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide/top/optimize-your-design/resource-use/data-types-and-operations/var-prec-fp-sup/adv-disadv-ac-dt.html) to see advantages and limitations of `ac_int` data types. +Please refer to the [FPGA Optimization Guide for Intel® oneAPI Toolkits Developer Guide](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-fpga-optimization-guide/top/optimize-your-design/resource-use/data-types-and-operations/var-prec-fp-sup/adv-disadv-ac-dt.html) to see advantages and limitations of `ac_int` data types. ### Simple Code Example @@ -142,28 +144,26 @@ Kernel `BitOps` demonstrates bit operations with bit select operator `[]` and bi 1. Install the design in `build` directory from the design directory by running `cmake`: - ```bash - mkdir build - cd build - ``` - - If you are compiling for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - - ```bash - cmake .. - ``` - - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ```bash - cmake .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - - ```bash - cmake .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake .. + ``` + + > **Note**: You can change the default target by using the command: + > ``` + > cmake .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design using the generated `Makefile`. The following four build targets are provided that match the recommended development flow: @@ -191,34 +191,29 @@ Kernel `BitOps` demonstrates bit operations with bit select operator `[]` and bi make fpga ``` -3. (Optional) As the above hardware compile may take several hours to complete, FPGA precompiled binaries (compatible with Linux* Ubuntu* 18.04) can be downloaded here. - ### On a Windows* System 1. Generate the `Makefile` by running `cmake`. - ``` - mkdir build - cd build - ``` - - To compile for the Intel® PAC with Intel Arria® 10 GX FPGA, run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. - ``` - - Alternatively, to compile for the Intel® FPGA PAC D5005 (with Intel Stratix® 10 SX), run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=intel_s10sx_pac:pac_s10 - ``` - - You can also compile for a custom FPGA platform. Ensure that the board support package is installed on your system. Then run `cmake` using the command: - - ``` - cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: - ``` + ``` + mkdir build + cd build + ``` + To compile for the default target (the Agilex™ device family), run `cmake` using the command: + ``` + cmake -G "NMake Makefiles" .. + ``` + > **Note**: You can change the default target by using the command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE= + > ``` + > + > Alternatively, you can target an explicit FPGA board variant and BSP by using the following command: + > ``` + > cmake -G "NMake Makefiles" .. -DFPGA_DEVICE=: + > ``` + > + > You will only be able to run an executable on the FPGA if you specified a BSP. 2. Compile the design through the generated `Makefile`. The following build targets are provided, matching the recommended development flow: @@ -246,11 +241,6 @@ Kernel `BitOps` demonstrates bit operations with bit select operator `[]` and bi nmake fpga ``` -> **Note**: The Intel® PAC with Intel Arria® 10 GX FPGA and Intel® FPGA PAC D5005 -(with Intel Stratix® 10 SX) do not yet support Windows*. Compiling to FPGA -hardware on Windows* requires a third-party or custom Board Support Package -(BSP) with Windows* support. - > **Note**: If you encounter any issues with long paths when compiling under Windows*, you may have to create your ‘build’ directory in a shorter path, for example c:\samples\build. You can then run @@ -274,7 +264,7 @@ Navigate to *System Viewer* (*Views* > *System Viewer*) and find the cluster in ac_int.fpga_emu.exe (Windows) ``` -2. Run the sample of the FPGA simulator device +2. Run the sample of the FPGA simulator device (the kernel executes on the CPU): * On Linux ```bash @@ -287,7 +277,7 @@ Navigate to *System Viewer* (*Views* > *System Viewer*) and find the cluster in set CL_CONTEXT_MPSIM_DEVICE_INTELFPGA= ``` -3. Run the sample on the FPGA device +3. Run the sample on the FPGA device (only if you ran `cmake` with `-DFPGA_DEVICE=:`): ```bash ./ac_int.fpga (Linux) diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/src/CMakeLists.txt b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/src/CMakeLists.txt index 0d127d1e79..aebece6e88 100755 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/src/CMakeLists.txt +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/ac_int/src/CMakeLists.txt @@ -6,12 +6,12 @@ set(FPGA_TARGET ${TARGET_NAME}.fpga) # FPGA board selection if(NOT DEFINED FPGA_DEVICE) - set(FPGA_DEVICE "intel_a10gx_pac:pac_a10") + set(FPGA_DEVICE "Agilex") message(STATUS "FPGA_DEVICE was not specified.\ - \nConfiguring the design to run on the default FPGA board ${FPGA_DEVICE} (Intel(R) PAC with Intel Arria(R) 10 GX FPGA). \ - \nPlease refer to the README for information on board selection.") + \nConfiguring the design to the default FPGA family: ${FPGA_DEVICE}\ + \nPlease refer to the README for information on target selection.") else() - message(STATUS "Configuring the design to run on FPGA board ${FPGA_DEVICE}") + message(STATUS "Configuring the design with the following target: ${FPGA_DEVICE}") endif() # These are Windows-specific flags: diff --git a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/dsp_control/README.md b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/dsp_control/README.md index fa3aa09284..492992d341 100644 --- a/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/dsp_control/README.md +++ b/DirectProgramming/C++SYCL_FPGA/Tutorials/Features/dsp_control/README.md @@ -3,9 +3,9 @@ This FPGA tutorial demonstrates how to set the implementation preference for certain math operations (addition, subtraction, and multiplication) between hardened DSP blocks and soft logic. | Optimized for | Description -|:--- |:--- +|:--- |:--- | OS | Linux* Ubuntu* 18.04/20.04
RHEL*/CentOS* 8
SUSE* 15
Windows* 10 -| Hardware | Intel® Programmable Acceleration Card (PAC) with Intel Arria® 10 GX FPGA
Intel® FPGA Programmable Acceleration Card (PAC) D5005 (with Intel Stratix® 10 SX)
Intel® FPGA 3rd party / custom platforms with oneAPI support
**Note**: Intel® FPGA PAC hardware is only compatible with Ubuntu 18.04* +| Hardware | Intel® Agilex™, Arria® 10, and Stratix® 10 FPGAs | Software | Intel® oneAPI DPC++/C++ Compiler | What you will learn | How to apply global DSP control in command-line interface.
How to apply local DSP control in source code.
Scope of datatypes and math operations that support DSP control. | Time to complete | 15 minutes @@ -18,6 +18,8 @@ This FPGA tutorial demonstrates how to set the implementation preference for cer > - ModelSim® SE > > When using the hardware compile flow, Intel® Quartus® Prime Pro Edition must be installed and accessible through your PATH. +> +> :warning: Make sure you add the device files associated with the FPGA that you are targeting to your Intel® Quartus® Prime installation. ## Prerequisites @@ -114,28 +116,26 @@ The second template argument `Propagate::