From 1b9ca05649f7007ae4d62829ddbfb43aef448ffb Mon Sep 17 00:00:00 2001 From: "Chen, Sheng S" Date: Mon, 5 Feb 2024 16:33:30 +0800 Subject: [PATCH 1/5] [SYCLomatic]Update bitcracker third party code. Signed-off-by: Chen, Sheng S --- .../Velocity-Bench/bitcracker/README.md | 222 +++++++++++---- .../bitcracker/bitcracker_migration.md | 263 ------------------ 2 files changed, 163 insertions(+), 322 deletions(-) delete mode 100644 third-party-programs/Velocity-Bench/bitcracker/bitcracker_migration.md diff --git a/third-party-programs/Velocity-Bench/bitcracker/README.md b/third-party-programs/Velocity-Bench/bitcracker/README.md index ffd6b1601..5de47c038 100644 --- a/third-party-programs/Velocity-Bench/bitcracker/README.md +++ b/third-party-programs/Velocity-Bench/bitcracker/README.md @@ -1,85 +1,189 @@ -# BitCracker +# Migration example: Migrate bitcracker to SYCL version +[SYCLomatic](https://github.com/oneapi-src/SYCLomatic) is a project to assist developers in migrating their existing code written in different programming languages to the SYCL* C++ heterogeneous programming model. It is an open source version of Intel® DPC++ Compatibility Tool. -BitCracker is the first open source password cracking tool for storage devices (original CUDA source code is from [here](https://github.com/e-ago/bitcracker)). +This file lists the detail steps to migrate CUDA version of [bitcracker](https://github.com/oneapi-src/Velocity-Bench/tree/main/bitcracker) to SYCL version with SYCLomatic. As follow table summaries the migration environment, software required and so on. -## Supported versions + | Optimized for | Description + |:--- |:--- + | OS | Linux* Ubuntu* 22.04 + | Software | Intel® oneAPI Base Toolkit, SYCLomatic + | What you will learn | Migration of CUDA code, Run SYCL code on oneAPI and Intel device + | Time to complete | 15 minutes -- CUDA: The original code was obtained from [here](https://github.com/e-ago/bitcracker) -- SYCL: The CUDA code was migrated using Intel DPCT, and then the resulting code was modified to remove the dpct headers. -- HIP: Created from CUDA version using hipify-perl script. -# Build Instructions +## Migrating bitcracker to SYCL -## To build for SYCL +### 1 Prepare the migration +#### 1.1 Get the source code of bitcracker and install the dependency library +```sh + $ git clone https://github.com/oneapi-src/Velocity-Bench.git + $ export bitcracker_HOME=/path/to/Velocity-Bench/bitcracker + $ cd ${bitcracker_HOME}/CUDA && mkdir build + $ cd build && cmake .. # make sure all dependency library are installed. +``` +Summary of bitcracker project source code: +``` + CUDA/ + ├── CMakeLists.txt + └── src + ├── aes.h + ├── attack.cu + ├── bitcracker.h + ├── main.cu + ├── sha256.h + ├── utils.cu + └── w_blocks.cu +``` + +#### 1.2 Prepare migration tool and SYCL run environment -For Intel GPU - -First, source icpx compiler. Then, + * Install SYCL run environment [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). After install, Intel® DPC++ Compatibility tool is also availalbe, setup the SYCL run environment as follow: ``` -cd bitcracker/SYCL -mkdir build -cd build -CXX=icpx cmake -DGPU_AOT=pvc .. -make -sj + $ source /opt/intel/oneapi/setvars.sh + $ dpct --version # Intel® DPC++ Compatibility tool version ``` -Note: -- To enable AOT compilation, please use the flag `-DGPU_AOT=pvc` for PVC. - -For AMD GPU - -First source clang++ compiler. Then, + * If want to try latest version of compatibility tool, try to install SYCLomatic by download prebuild of [SYCLomatic release](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md#Releases) or [build from source](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md), as follow give the steps to install prebuild version: + ``` + $ export SYCLomatic_HOME=/path/to/install/SYCLomatic + $ mkdir $SYCLomatic_HOME + $ cd $SYCLomatic_HOME + $ wget https://github.com/oneapi-src/SYCLomatic/releases/download/20240203/linux_release.tgz #Change the timestamp 20240203 to latest one + $ tar xzvf linux_release.tgz + $ source setvars.sh + $ dpct --version #SYCLomatic version + ``` + +For more information on configuring environment variables, see [Use the setvars Script with Linux*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html). + +### 2 Generate the compilation database +``` sh +$ cd ${bitcracker_HOME}/CUDA/build +$ make clean +$ intercept-build make +$ ls compile_commands.json # make sure compile_commands.json is generated +compile_commands.json ``` -cd bitcracker/SYCL -mkdir build -cd build -CXX=clang++ cmake -DUSE_AMDHIP_BACKEND=gfx90a .. -make -sj +### 3 Migrate the source code and build script +```sh +# From the CUDA directory as root directory: +$ cd ${bitcracker_HOME}/CUDA +$ dpct --in-root=. -p=./build/compile_commands.json --out-root=out --gen-build-script --cuda-include-path=/usr/local/cuda/include ``` -Note: -- We use the flag `-DUSE_AMDHIP_BACKEND=gfx90a` for MI250. Use the correct value for your GPU. +Description of the options: + * `--in-root`: provide input files to specify where to locate the CUDA files that needs migration. + * `-p`: specify compilation database to migrate the whole project. + * `--out-root`: designate where to generate the resulting files (default is `dpct_output`). + * `--gen-build-script`: generate the `Makefile.dpct` for the migrated code. -For NVIDIA GPU - -First source clang++ compiler. Then, +Now you can see the migrated files in the `out` folder as follow: ``` -cd bitcracker/SYCL -mkdir build -cd build -CXX=clang++ cmake -DUSE_NVIDIA_BACKEND=YES -DUSE_SM=80 .. -make -sj + out/ + ├── MainSourceFiles.yaml + ├── Makefile.dpct + └── src + ├── aes.h + ├── aes.h.yaml + ├── attack.dp.cpp + ├── bitcracker.h + ├── bitcracker.h.yaml + ├── main.dp.cpp + ├── sha256.h + ├── sha256.h.yaml + ├── utils.dp.cpp + └── w_blocks.dp.cpp ``` -Note: -- We use the flag `-DUSE_SM=80` for A100 or `-DUSE_SM=90` for H100. +### 4 Review the migrated source code and fix all `DPCT` warnings -## To build for CUDA +SYCLomatic and [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html) define a list of `DPCT` warnings and embed the warning in migrated source code if need manual effort to check. All the warnings in the migrated code should be reviewed and fixed. For detail of `DPCT` warnings and corresponding fix examples, refer to [Intel® DPC++ Compatibility Tool Developer Guide and Reference](https://www.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference.html) or [SYCLomatic doc page](https://oneapi-src.github.io/SYCLomatic/dev_guide/diagnostics-reference.html). +Fix the warning in migrated bitcracker code: ``` -cd bitcracker/CUDA -mkdir build -cd build -CXX=nvcc cmake -DUSE_SM=80 .. -make -sj +$ cat ${bitcracker_HOME}/CUDA/out/src/attack.dp.cpp +... +/* +DPCT1110:3: The total declared local variable size in device function +decrypt_vmk_with_mac exceeds 128 bytes and may cause high register pressure. +Consult with your hardware vendor to find the total register size available and +adjust the code, or use smaller sub-group size to avoid high register pressure. +*/ + +void decrypt_vmk_with_mac( +... ``` +This message is shown because the Compatibility Tool finding the user-declared private memeory size of local variable in the kernel will exceed 128 bytes which is the largest register size for each work-item on the Intel® XE core when the sub-group size is 32. It may cause high register pressure. -Note: -- We use the flag `-DUSE_SM=80` for A100 or `-DUSE_SM=90` for H100. - -## To build for HIP - +In **out/src/attack.dp.cpp**, the application defined 56 **uint32_t** type value, totally need 224 bytes memory which exceed the 128 bytes on the XE GPU vector engine register size. The migrated code didn't specify the sub group size, let compiler to determine the size. And user can explicitly specify the sub group size to 16 by ```[[intel::reqd_sub_group_size(16)]]``` after the submit function. Manually changing is add the reqd_sub_group_size: ``` -cd bitcracker/HIP -mkdir build -cd build -CXX=hipcc cmake -DROCM_PATH=/opt/rocm .. -make -sj + cgh.parallel_for( + sycl::nd_range<3>(sycl::range<3>(1, 1, num_blocks) * + sycl::range<3>(1, 1, block_size), + sycl::range<3>(1, 1, block_size)), + [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(16)]] { + decrypt_vmk_with_mac( + num_read_pswd, d_found, d_vmk, d_vmkIV, d_mac, d_macIV, + d_computedMacIV, v0, v1, v2, v3, s0, s1, s2, s3, + d_pswd_uint32, d_w_words_uint32, item_ct1, TS0_ptr_ct1, + TS1_ptr_ct1, TS2_ptr_ct1, TS3_ptr_ct1); + }); ``` +### 5 Build the migrated bitcracker +``` +$ cd ${bitcracker_HOME}/CUDA/out +$ make -f Makefile.dpct +``` +### 6 Run migrated SYCL version bitcracker +``` +$: ./bitcracker -f ../../hash_pass/img_win8_user_hash.txt -d ../../hash_pass/user_passwords_60000.txt -b 60000 +---------> BitCracker: BitLocker password cracking tool <--------- + + +================================== +Retrieving Info +================================== + +Reading hash file "../../hash_pass/img_win8_user_hash.txt" +================================================ + Attack +================================================ +Type of attack: User Password +Psw per thread: 1 +max_num_pswd_per_read: 60000 +Dictionary: ../../hash_pass/user_passwords_60000.txt +MAC Comparison (-m): Yes + + +Iter: 1, num passwords read: 60000 +Kernel execution: + Effective passwords: 60000 + Passwords Range: + npknpByH7N2m3OnLNH1X9DJxLrzIFWk + ..... + dL_7uuf3QCz-c6K3xDu0 +-------------------- +================================================ +Bitcracker attack completed +Total passwords evaluated: 60000 +Password not found! +================================================ +time to subtract from total: 0.0148924 s +bitcracker - total time for whole calculation: 452.283 s +``` +**Note:** +* The testing result was running on Intel(R) Core(TM) i7-13700K CPU backend with Intel® oneAPI Base Toolkit(2023.2 version). +* The Reference migrated code is attached in **migrated** folder. -# Run instructions +If an error occurs during runtime, refer to [Diagnostics Utility for Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html). -After building, to run the workload, cd into the build folder. Then +## bitcracker License +[License.txt](https://github.com/oneapi-src/Velocity-Bench/blob/main/bitcracker/LICENSE.md) -``` -./bitcracker -f ../../hash_pass/img_win8_user_hash.txt -d ../../hash_pass/user_passwords_60000.txt -b 60000 -``` +## Reference +* Command Line Options of [SYCLomatic](https://oneapi-src.github.io/SYCLomatic/dev_guide/command-line-options-reference.html) or [Intel® DPC++ Compatibility Tool](https://software.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/command-line-options-reference.html) +* [oneAPI GPU Optimization Guide](https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/) +* [SYCLomatic project](https://github.com/oneapi-src/SYCLomatic/) -# Output -Output gives the total time for running the whole workload. +## Trademarks information +Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
+\*Other names and brands may be claimed as the property of others. SYCL is a trademark of the Khronos Group Inc. diff --git a/third-party-programs/Velocity-Bench/bitcracker/bitcracker_migration.md b/third-party-programs/Velocity-Bench/bitcracker/bitcracker_migration.md deleted file mode 100644 index fdcd6bd52..000000000 --- a/third-party-programs/Velocity-Bench/bitcracker/bitcracker_migration.md +++ /dev/null @@ -1,263 +0,0 @@ -# SYCLomatic Tool: Migrate bitcracker APP -## Use the command line to migrate large code base. -The SYCLomatic project (the Open source version of Intel® DPC++ Compatibility Tool) can migrate project that contain multiple source and header files. -| Optimized for | Description -|:--- |:--- -| OS | Linux* Ubuntu* 22.04 -| Software | Intel® DPC++ Compatibility Tool -| What you will learn | Simple invocation of dpct to migrate CUDA code -| Time to complete | 15 minutes - - -# Purpose -The SYCLomatic tool can migrate projects composed with multiple source and header files. -Used the dpct option **--in-root** option to set the root location of your prepared migration APP. Only the files under this specified root will be considered to migrate. Files located outside the **--in-root** will be considered system files or libraries files and will not be migrated. - -The dpct **--out-root** will specify the directory into which generated SYCL*-compilant code producted by the dpct tool is written. The relative path and the name will be kept, except the file extensions are changed to **.dp.cpp**. - - -# Key Implementation Details -Except the --in-root and --out-root options, there are additional options can help to migrate the code more smoothly: [Command Line Options Reference](https://software.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/command-line-options-reference.html). - - - -## Migrating the CUDA Sample to Data Parallel C++ with the Intel® DPC++ Compatibility Tool - -Building and running the CUDA sample is not required to migrate this project -to a SYCL*-compliant project. - -> **Note**: Certain CUDA header files, referenced by the CUDA application -> source files to be migrated, need to be accessible for the migration step. -> See *Before you Begin* in [Get Started with the Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/develop/documentation/get-started-with-intel-dpcpp-compatibility-tool/top.html#top_BEFORE_YOU_BEGIN). - -> **Note**: If you have not already done so, set up your CLI -> environment by sourcing the `setvars` script located in -> the root of your oneAPI installation. -> -> Linux*: -> - For system wide installations: `. /opt/intel/oneapi/setvars.sh` -> - For private installations: `. ~/intel/oneapi/setvars.sh` -> - For non-POSIX shells, like csh, use the following command: `$ bash -c 'source /setvars.sh ; exec csh'` -> -> Windows*: -> - `C:\Program Files(x86)\Intel\oneAPI\setvars.bat` -> - For Windows PowerShell*, use the following command: `cmd.exe "/K" '"C:\Program Files (x86)\Intel\oneAPI\setvars.bat" && powershell'` -> -> For more information on configuring environment variables, see [Use the setvars Script with Linux* or MacOS*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-linux-or-macos.html) or [Use the setvars Script with Windows*](https://www.intel.com/content/www/us/en/develop/documentation/oneapi-programming-guide/top/oneapi-development-environment-setup/use-the-setvars-script-with-windows.html). - - -### Command-Line on a Linux* System - -1. This sample project contains a simple CUDA program with eight files - (CUDA/CMakeLists.txt, CUDA/src/aes.h, CUDA/src/attack.cu, CUDA/src/bitcracker.h, CUDA/src/main.cu, CUDA/src/sha256.h, CUDA/src/utils.cu and CUDA/src/w_blocks.cu) located in CUDA directory and the sub-directory src of CUDA: - -``` -CUDA -├── CMakeLists.txt -└── src - ├── aes.h - ├── attack.cu - ├── bitcracker.h - ├── main.cu - ├── sha256.h - ├── utils.cu - └── w_blocks.cu -``` -2. Make a `build` directory to use the **cmake** command line tool to generate the corresponding build tool (make) directly. -```sh -$ cd CUDA && mkdir build -$ cd build && cmake .. -``` -3. Use the **intercept-build** tool to intercept the build step to generate the compilation database `compile_commands.json` file under the same fodler. -``` sh -$ intercept-build make -$ ls . -CMakeCache.txt CMakeFiles Makefile bitcracker cmake_install.cmake compile_commands.json -``` -4. Use the tool's `--in-root` option and provide input files to specify where - to locate the CUDA files that needs migration; use the tool’s `--out-root` - option to designate where to generate the resulting files(default is `dpct_output`); use the tool's `-p` option to specify compilation database to migrate the whole project and use the `--gen-build-script` to generate the `Makefile.dpct` for the migrated code: - -```sh -# From the CUDA directory as root directory: -$ cd .. -$ dpct --in-root=. -p=./build/compile_commands.json --out-root=out --gen-build-script --cuda-include-path=/usr/local/cuda/include -``` - -> If an `--in-root` option is not specified, the directory of the first input -> source file is implied. If `--out-root` is not specified, `./dpct_output` -> is implied. - -You should see the migrated files in the `out` folder that was specified -by the `--out-root` option: - -``` -out/ -├── MainSourceFiles.yaml -├── Makefile.dpct -└── src - ├── aes.h - ├── aes.h.yaml - ├── attack.dp.cpp - ├── bitcracker.h - ├── bitcracker.h.yaml - ├── main.dp.cpp - ├── sha256.h - ├── sha256.h.yaml - ├── utils.dp.cpp - └── w_blocks.dp.cpp -``` - -5. Inspect the migrated source code, address any `DPCT` warnings generated - by the Intel® DPC++ Compatibility Tool, and verify the new program correctness. - -Warnings are printed to the console and added as comments in the migrated -source. See *Diagnostic Reference* in the [Intel® DPC++ Compatibility Tool Developer Guide and Reference](https://www.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference.html) for more information on what each warning means. - - -This sample should generate the following warnings: -``` -warning: *DPCT1110:0*: The total declared local variable size in device function -decrypt_vmk_with_mac exceeds 128 bytes and may cause high register pressure. -Consult with your hardware vendor to find the total register size available and -adjust the code, or use smaller sub-group size to avoid high register pressure. -``` - -``` -warning: DPCT1009:0: SYCL uses exceptions to report errors and does not use the error -codes. The original code was commented out and a warning string was inserted. -You need to rewrite this code. -``` - -See below **Addressing Warnings in the Migrated Code** to understand how to -resolve the warning. - - -6. Build the migrated code with generated Makefile.dpct -``` -$ cd out -$ make -f Makefile.dpct -# Please make sure the oneAPI package was installed before building the application. -``` - -# Addressing Warnings in Migrated Code - -Migration generated one warning for code that `dpct` could not migrate: -``` -warning: *DPCT1110:0*: The total declared local variable size in device function -decrypt_vmk_with_mac exceeds 128 bytes and may cause high register pressure. -Consult with your hardware vendor to find the total register size available and -adjust the code, or use smaller sub-group size to avoid high register pressure. -``` -This message is shown because the Compatibility Tool finding the user declared private memeory size of local variable in the kernel will exceed the 128 bytes which is the largest register size for the each work-item on the Intel XE core when the sub-group size is 32. It may cause high register pressure. - -Open **out/src/attack.dp.cpp** and find the error **DPCT1110**, the application defined 56 **uint32_t** type value, totally need 224 bytes private value which exceed the 128 bytes on the XE GPU vector engine register size. The migrated code didn't specify the sub group size, let compiler to determine the size. And user can explicitly specify the sub group size to 16 by ```[[intel::reqd_sub_group_size(16)]]``` after the submit function. - -``` - cgh.parallel_for( - sycl::nd_range<3>(sycl::range<3>(1, 1, num_blocks) * - sycl::range<3>(1, 1, block_size), - sycl::range<3>(1, 1, block_size)), - [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(16)]] { - decrypt_vmk_with_mac( - num_read_pswd, d_found, d_vmk, d_vmkIV, d_mac, d_macIV, - d_computedMacIV, v0, v1, v2, v3, s0, s1, s2, s3, - d_pswd_uint32, d_w_words_uint32, item_ct1, TS0_ptr_ct1, - TS1_ptr_ct1, TS2_ptr_ct1, TS3_ptr_ct1); - }); -``` - - -``` -warning: DPCT1009:5: SYCL uses exceptions to report errors and does not use the error -codes. The original code was commented out and a warning string was inserted. -You need to rewrite this code. -``` - -For **DPCT1009**, this message is shown because the Compatibility Tool migrated from returning an error code in the CUDA code to determine whether the CUDA execution was successful or not, but the SYCL uses the try-exception to catch failure of the API call. You can manually adjusting the code to generate the SYCL compliant code. - -Open out/src/bitcracker.h and locate the error **DPCT1009**. Then make the following changes: - -Remove the macro definition: -``` -#define CUDA_CHECK(call) \ - { dpct::err0 err = call; } -``` - -You need to change the macro expansion for all the files in the **out** directory and sub-directory. - -Strip the CUDA_CHECK macro expansion under the **out** folder: -``` -./src/bitcracker.h:115:#define CUDA_CHECK(call) \ -./src/attack.dp.cpp:941: CUDA_CHECK(DPCT_CHECK_ERROR( -./src/attack.dp.cpp:948: CUDA_CHECK(DPCT_CHECK_ERROR(h_pswd_char = sycl::malloc_host( -./src/attack.dp.cpp:956: CUDA_CHECK(DPCT_CHECK_ERROR(h_pswd_uint32 = sycl::malloc_host( -./src/attack.dp.cpp:1190: CUDA_CHECK( -./src/attack.dp.cpp:1192: CUDA_CHECK(DPCT_CHECK_ERROR( -./src/w_blocks.dp.cpp:202: CUDA_CHECK( -./src/w_blocks.dp.cpp:205: CUDA_CHECK( -./src/w_blocks.dp.cpp:210: CUDA_CHECK(DPCT_CHECK_ERROR( -./src/w_blocks.dp.cpp:214: CUDA_CHECK( -./src/w_blocks.dp.cpp:235: CUDA_CHECK( -./src/w_blocks.dp.cpp:247: CUDA_CHECK(DPCT_CHECK_ERROR(sycl::free(salt_d, dpct::get_in_order_queue()))); -./src/w_blocks.dp.cpp:248: CUDA_CHECK( -./src/main.dp.cpp:193: CUDA_CHECK(DPCT_CHECK_ERROR(dpct::select_device(0))); -./src/main.dp.cpp:205: CUDA_CHECK( -./src/main.dp.cpp:234: CUDA_CHECK(DPCT_CHECK_ERROR( -....... -``` -## Rebuild the migrated code -After manually addressing the warning error, you can rebuild the application: -``` -$ make -f Makefile.dpct clean -$ make -f Makefile.dpct -``` -# Example Output - -When you run the migrated application, you should see the following console -output: - -``` -$: ./bitcracker -f ../../hash_pass/img_win8_user_hash.txt -d ../../hash_pass/user_passwords_60000.txt -b 60000 ----------> BitCracker: BitLocker password cracking tool <--------- - - -================================== -Retrieving Info -================================== - -Reading hash file "../../hash_pass/img_win8_user_hash.txt" -================================================ - Attack -================================================ -Type of attack: User Password -Psw per thread: 1 -max_num_pswd_per_read: 60000 -Dictionary: ../../hash_pass/user_passwords_60000.txt -MAC Comparison (-m): Yes - - -Iter: 1, num passwords read: 60000 -Kernel execution: - Effective passwords: 60000 - Passwords Range: - npknpByH7N2m3OnLNH1X9DJxLrzIFWk - ..... - dL_7uuf3QCz-c6K3xDu0 --------------------- -================================================ -Bitcracker attack completed -Total passwords evaluated: 60000 -Password not found! -================================================ -time to subtract from total: 0.0148924 s -bitcracker - total time for whole calculation: 452.283 s -``` -**Note:** The testing result was running on Intel(R) Core(TM) i7-13700K on the CPU backend with 2023.2 oneAPI released oneAPI package. And the reference migrated code is attached in **migrated** folder. - -If an error occurs, troubleshoot the problem using the Diagnostics Utility for Intel® oneAPI Toolkits. -[Learn more](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html). - -## License -Code samples are licensed under the GNU General Public License version 2. See -[License.txt](https://github.com/oneapi-src/Velocity-Bench/blob/main/bitcracker/LICENSE.md) for details. From 91f54562264b9a9fc65af2344516938eb8980ac5 Mon Sep 17 00:00:00 2001 From: "Chen, Sheng S" Date: Wed, 13 Mar 2024 20:54:01 +0800 Subject: [PATCH 2/5] update --- third-party-programs/Velocity-Bench/bitcracker/README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/third-party-programs/Velocity-Bench/bitcracker/README.md b/third-party-programs/Velocity-Bench/bitcracker/README.md index 5de47c038..a25975793 100644 --- a/third-party-programs/Velocity-Bench/bitcracker/README.md +++ b/third-party-programs/Velocity-Bench/bitcracker/README.md @@ -113,7 +113,7 @@ void decrypt_vmk_with_mac( ``` This message is shown because the Compatibility Tool finding the user-declared private memeory size of local variable in the kernel will exceed 128 bytes which is the largest register size for each work-item on the Intel® XE core when the sub-group size is 32. It may cause high register pressure. -In **out/src/attack.dp.cpp**, the application defined 56 **uint32_t** type value, totally need 224 bytes memory which exceed the 128 bytes on the XE GPU vector engine register size. The migrated code didn't specify the sub group size, let compiler to determine the size. And user can explicitly specify the sub group size to 16 by ```[[intel::reqd_sub_group_size(16)]]``` after the submit function. Manually changing is add the reqd_sub_group_size: +In **out/src/attack.dp.cpp**, the application defines 56 **uint32_t** type value, totally need 224 bytes memory which exceed the 128 bytes on the XE GPU vector engine register size. The migrated code have not specified the sub group size, let compiler to determine the size. And user can explicitly specify the sub group size to 16 by ```[[intel::reqd_sub_group_size(16)]]``` after the submit function. Manually changing is add the reqd_sub_group_size: ``` cgh.parallel_for( sycl::nd_range<3>(sycl::range<3>(1, 1, num_blocks) * @@ -170,7 +170,7 @@ time to subtract from total: 0.0148924 s bitcracker - total time for whole calculation: 452.283 s ``` **Note:** -* The testing result was running on Intel(R) Core(TM) i7-13700K CPU backend with Intel® oneAPI Base Toolkit(2023.2 version). +* The testing result was collected run on Intel(R) Core(TM) i7-13700K CPU backend with Intel® oneAPI Base Toolkit(2023.2 version). * The Reference migrated code is attached in **migrated** folder. If an error occurs during runtime, refer to [Diagnostics Utility for Intel® oneAPI Toolkits](https://www.intel.com/content/www/us/en/develop/documentation/diagnostic-utility-user-guide/top.html). From 8d59a25f3cb29ff995ff14684a0115b473f74191 Mon Sep 17 00:00:00 2001 From: "Chen, Sheng S" Date: Thu, 14 Mar 2024 09:50:38 +0800 Subject: [PATCH 3/5] update --- .../Velocity-Bench/bitcracker/README.md | 16 +--------------- 1 file changed, 1 insertion(+), 15 deletions(-) diff --git a/third-party-programs/Velocity-Bench/bitcracker/README.md b/third-party-programs/Velocity-Bench/bitcracker/README.md index a25975793..94edc8a5a 100644 --- a/third-party-programs/Velocity-Bench/bitcracker/README.md +++ b/third-party-programs/Velocity-Bench/bitcracker/README.md @@ -111,22 +111,8 @@ adjust the code, or use smaller sub-group size to avoid high register pressure. void decrypt_vmk_with_mac( ... ``` -This message is shown because the Compatibility Tool finding the user-declared private memeory size of local variable in the kernel will exceed 128 bytes which is the largest register size for each work-item on the Intel® XE core when the sub-group size is 32. It may cause high register pressure. +This message is shown because the Compatibility Tool finding the user-declared private memeory size of local variable in the kernel may exceed 128 bytes which is the largest register size for each work-item. It may cause high register pressure. For more details, you can refer [oneAPI GPU Optimization Guide](https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/overview.html) -In **out/src/attack.dp.cpp**, the application defines 56 **uint32_t** type value, totally need 224 bytes memory which exceed the 128 bytes on the XE GPU vector engine register size. The migrated code have not specified the sub group size, let compiler to determine the size. And user can explicitly specify the sub group size to 16 by ```[[intel::reqd_sub_group_size(16)]]``` after the submit function. Manually changing is add the reqd_sub_group_size: -``` - cgh.parallel_for( - sycl::nd_range<3>(sycl::range<3>(1, 1, num_blocks) * - sycl::range<3>(1, 1, block_size), - sycl::range<3>(1, 1, block_size)), - [=](sycl::nd_item<3> item_ct1) [[intel::reqd_sub_group_size(16)]] { - decrypt_vmk_with_mac( - num_read_pswd, d_found, d_vmk, d_vmkIV, d_mac, d_macIV, - d_computedMacIV, v0, v1, v2, v3, s0, s1, s2, s3, - d_pswd_uint32, d_w_words_uint32, item_ct1, TS0_ptr_ct1, - TS1_ptr_ct1, TS2_ptr_ct1, TS3_ptr_ct1); - }); -``` ### 5 Build the migrated bitcracker ``` $ cd ${bitcracker_HOME}/CUDA/out From 7fd09ac2d57db2514ddbdd9f0ff8776ceb6a5fa3 Mon Sep 17 00:00:00 2001 From: Shengchen Date: Sun, 24 Mar 2024 10:26:56 +0800 Subject: [PATCH 4/5] fix grammar issue. --- .../Velocity-Bench/bitcracker/README.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/third-party-programs/Velocity-Bench/bitcracker/README.md b/third-party-programs/Velocity-Bench/bitcracker/README.md index 94edc8a5a..a567815d8 100644 --- a/third-party-programs/Velocity-Bench/bitcracker/README.md +++ b/third-party-programs/Velocity-Bench/bitcracker/README.md @@ -1,7 +1,7 @@ # Migration example: Migrate bitcracker to SYCL version [SYCLomatic](https://github.com/oneapi-src/SYCLomatic) is a project to assist developers in migrating their existing code written in different programming languages to the SYCL* C++ heterogeneous programming model. It is an open source version of Intel® DPC++ Compatibility Tool. -This file lists the detail steps to migrate CUDA version of [bitcracker](https://github.com/oneapi-src/Velocity-Bench/tree/main/bitcracker) to SYCL version with SYCLomatic. As follow table summaries the migration environment, software required and so on. +This file lists the detailed steps to migrate CUDA version of [bitcracker](https://github.com/oneapi-src/Velocity-Bench/tree/main/bitcracker) to SYCL version with SYCLomatic. As follow table summarizes the migration environment, the software required, and so on. | Optimized for | Description |:--- |:--- @@ -19,7 +19,7 @@ This file lists the detail steps to migrate CUDA version of [bitcracker](https:/ $ git clone https://github.com/oneapi-src/Velocity-Bench.git $ export bitcracker_HOME=/path/to/Velocity-Bench/bitcracker $ cd ${bitcracker_HOME}/CUDA && mkdir build - $ cd build && cmake .. # make sure all dependency library are installed. + $ cd build && cmake .. # Make sure all dependency libraries are installed. ``` Summary of bitcracker project source code: ``` @@ -37,13 +37,13 @@ Summary of bitcracker project source code: #### 1.2 Prepare migration tool and SYCL run environment - * Install SYCL run environment [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). After install, Intel® DPC++ Compatibility tool is also availalbe, setup the SYCL run environment as follow: + * Install SYCL run environment [Intel® oneAPI Base Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/base-toolkit.html). After installation, the Intel® DPC++ Compatibility tool is also available, set up the SYCL run environment as follows: ``` $ source /opt/intel/oneapi/setvars.sh $ dpct --version # Intel® DPC++ Compatibility tool version ``` - * If want to try latest version of compatibility tool, try to install SYCLomatic by download prebuild of [SYCLomatic release](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md#Releases) or [build from source](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md), as follow give the steps to install prebuild version: + * If want to try the latest version of the compatibility tool, try to install SYCLomatic by downloading prebuild of [SYCLomatic release](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md#Releases) or [build from source](https://github.com/oneapi-src/SYCLomatic/blob/SYCLomatic/README.md), as follow give the steps to install prebuild version: ``` $ export SYCLomatic_HOME=/path/to/install/SYCLomatic $ mkdir $SYCLomatic_HOME @@ -71,8 +71,8 @@ $ cd ${bitcracker_HOME}/CUDA $ dpct --in-root=. -p=./build/compile_commands.json --out-root=out --gen-build-script --cuda-include-path=/usr/local/cuda/include ``` Description of the options: - * `--in-root`: provide input files to specify where to locate the CUDA files that needs migration. - * `-p`: specify compilation database to migrate the whole project. + * `--in-root`: provide input files to specify where to locate the CUDA files that need migration. + * `-p`: specify the compilation database to migrate the whole project. * `--out-root`: designate where to generate the resulting files (default is `dpct_output`). * `--gen-build-script`: generate the `Makefile.dpct` for the migrated code. @@ -97,7 +97,7 @@ Now you can see the migrated files in the `out` folder as follow: SYCLomatic and [Intel® DPC++ Compatibility Tool](https://www.intel.com/content/www/us/en/developer/tools/oneapi/dpc-compatibility-tool.html) define a list of `DPCT` warnings and embed the warning in migrated source code if need manual effort to check. All the warnings in the migrated code should be reviewed and fixed. For detail of `DPCT` warnings and corresponding fix examples, refer to [Intel® DPC++ Compatibility Tool Developer Guide and Reference](https://www.intel.com/content/www/us/en/develop/documentation/intel-dpcpp-compatibility-tool-user-guide/top/diagnostics-reference.html) or [SYCLomatic doc page](https://oneapi-src.github.io/SYCLomatic/dev_guide/diagnostics-reference.html). -Fix the warning in migrated bitcracker code: +Fix the warning in the migrated bitcracker code: ``` $ cat ${bitcracker_HOME}/CUDA/out/src/attack.dp.cpp ... @@ -111,7 +111,7 @@ adjust the code, or use smaller sub-group size to avoid high register pressure. void decrypt_vmk_with_mac( ... ``` -This message is shown because the Compatibility Tool finding the user-declared private memeory size of local variable in the kernel may exceed 128 bytes which is the largest register size for each work-item. It may cause high register pressure. For more details, you can refer [oneAPI GPU Optimization Guide](https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/overview.html) +This message is shown because the Compatibility Tool finding the user-declared private memory size of the local variable in the kernel may exceed 128 bytes, which is the largest register size for each work-item. It may cause high register pressure. For more details, you can refer [oneAPI GPU Optimization Guide](https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-0/overview.html) ### 5 Build the migrated bitcracker ``` @@ -171,5 +171,5 @@ If an error occurs during runtime, refer to [Diagnostics Utility for Intel® one ## Trademarks information -Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries.
+Intel, the Intel logo, and other Intel marks are trademarks of Intel Corporation or its subsidiaries. \*Other names and brands may be claimed as the property of others. SYCL is a trademark of the Khronos Group Inc. From a9f2ec4dfa68bf7cd6c94e419c242208bde99233 Mon Sep 17 00:00:00 2001 From: "Wang, Zhiming" Date: Mon, 25 Mar 2024 16:29:04 +0800 Subject: [PATCH 5/5] Update README.md --- third-party-programs/Velocity-Bench/bitcracker/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/third-party-programs/Velocity-Bench/bitcracker/README.md b/third-party-programs/Velocity-Bench/bitcracker/README.md index a567815d8..c9a189cbc 100644 --- a/third-party-programs/Velocity-Bench/bitcracker/README.md +++ b/third-party-programs/Velocity-Bench/bitcracker/README.md @@ -1,5 +1,5 @@ # Migration example: Migrate bitcracker to SYCL version -[SYCLomatic](https://github.com/oneapi-src/SYCLomatic) is a project to assist developers in migrating their existing code written in different programming languages to the SYCL* C++ heterogeneous programming model. It is an open source version of Intel® DPC++ Compatibility Tool. +[SYCLomatic](https://github.com/oneapi-src/SYCLomatic) is a project to assist developers in migrating their existing code written in different programming languages to the SYCL* C++ heterogeneous programming model. It is an open source version of the Intel® DPC++ Compatibility Tool. This file lists the detailed steps to migrate CUDA version of [bitcracker](https://github.com/oneapi-src/Velocity-Bench/tree/main/bitcracker) to SYCL version with SYCLomatic. As follow table summarizes the migration environment, the software required, and so on.