oneapi-src · zhimingwang36 · Oct 31, 2024 · Oct 31, 2024
diff --git a/_images/DataFlowGraph.png b/_images/DataFlowGraph.png
diff --git a/_sources/dev_guide/frequently-asked-questions-faq.rst b/_sources/dev_guide/frequently-asked-questions-faq.rst
@@ -4,7 +4,7 @@ Frequently Asked Questions
 
 **General Information**
 
-* `How do I migrate source files that use C++11 or newer standard features on Linux\* and Windows\*?`_
+* `How do I migrate source files that use C++20 or newer standard features on Linux\* and Windows\*?`_
 * `How do I migrate files on Windows when using a CMake project?`_
 * `How is the migrated code formatted?`_
 * `Why does the compilation database not contain all source files in the project?`_
@@ -27,24 +27,25 @@ Frequently Asked Questions
 General Information
 -------------------
 
-How do I migrate source files that use C++11 or newer standard features on Linux\* and Windows\*?
+How do I migrate source files that use C++20 or newer standard features on Linux\* and Windows\*?
 *************************************************************************************************
 
 On Linux, the default C++ standard for |tool_name|'s
-parser is C++98, with some C++11 features
-accepted. If you want to enable other C++11 or newer standard
-features in |tool_name|, you need to add
+parser is C++17. If you want to enable newer standard features
+in |tool_name|, you need to add
 the ``--extra-arg="-std=<value>"`` option to the
 command line. The supported values are:
 
--  ``c++11``
 -  ``c++14``
 -  ``c++17``
+-  ``c++20``
+-  ``c++23``
+-  ``c++26``
 
 On Windows, the default C++ standard for |tool_name|'s
-parser is C++14. If you want to enable C++17
-features in |tool_name|, you need to add
-the option ``--extra-arg="-std=c++17"`` to the command line.
+parser is C++17. If you want to enable C20 features 
+in |tool_name|, you need to add
+the option ``--extra-arg="-std=c++20"`` to the command line.
 
 How do I migrate files on Windows when using a CMake project?
 *************************************************************
@@ -331,6 +332,32 @@ Based on these language standards |tool_name| emits the parsing error.
 
 You may need to adjust the source code.
 
+How do I resolve migration failure with "fatal error: 'cmath' file not found" in Linux?
+***************************************************************************************
+
+The problem stems from an absent include path for the new standard C++ library. 
+The |tool_name| is designed to automatically detect the appropriate version of the C++ header file by checking the compiler package at ``/usr/lib/gcc/x86_64-linux-gnu``and C++ header at ``/usr/include/c++``.
+In the following example, it tries to use C++ header version 12 based on the knowledge of the compiler package, but it fails because C++ header version 12 does not exist."
+
+.. code-block:: 
+   :linenos:
+
+   ls /usr/lib/gcc/x86_64-linux-gnu
+   11 12
+   ls /usr/include/c++
+   11
+
+To fix this issue, please install the version 12 g++ package or libstdc++ package.
+
+.. code-block:: 
+   :linenos:
+
+   sudo apt install g++-12 
+   or
+   sudo apt install libstdc++-12-dev 
+
+If your installation differs, install the missing version of ``g++-XX`` or ``libstdc++-XX-dev`` based on what you see missing from the results of "ls /usr/lib/gcc/x86_64-linux-gnu" and "ls /usr/include/c++".
+
 How do I resolve incorrect runtime behavior for dpct::dev_mgr and dpct:mem_mgr in a library project that is loaded more than once in another application?
 ***********************************************************************************************************************************************************
 

diff --git a/_sources/dev_guide/migration/debug-with-codepin.rst b/_sources/dev_guide/migration/debug-with-codepin.rst
@@ -149,17 +149,17 @@ After migration, there will be two files: ``dpct_output_codepin_sycl/example.dp.
         q_ct1.memcpy(d_a, h_a, vectorSize * 12);
 
         // Launch the CUDA kernel
-        dpct::experimental::gen_prolog_API_CP(
-            "example.cu:38:3(SYCL)", &q_ct1,
-            VAR_SCHEMA_0, (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
+        dpctexp::codepin::gen_prolog_API_CP(
+            "vectorAdd:example.cu:24:9",
+            &q_ct1, "d_a", d_a, "d_result", d_result);
         q_ct1.parallel_for(
             sycl::nd_range<3>(sycl::range<3>(1, 1, 4), sycl::range<3>(1, 1, 4)),
             [=](sycl::nd_item<3> item_ct1) { vectorAdd(d_a, d_result, item_ct1); });
 
         // Copy result from device to host
-        dpct::experimental::gen_epilog_API_CP(
-            "example.cu:38:3(SYCL)", &q_ct1,
-            VAR_SCHEMA_0, (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
+        dpctexp::codepin::gen_epilog_API_CP(
+            "vectorAdd:example.cu:24:9",
+            &q_ct1, "d_a", d_a, "d_result", d_result);
 
         q_ct1.memcpy(h_result, d_result, vectorSize * sizeof(sycl::int3)).wait();
 
@@ -212,15 +212,15 @@ After migration, there will be two files: ``dpct_output_codepin_sycl/example.dp.
         cudaMemcpy(d_a, h_a, vectorSize * 12, cudaMemcpyHostToDevice);
 
         // Launch the CUDA kernel
-        dpct::experimental::gen_prolog_API_CP(
-            "example.cu:38:3", 0, VAR_SCHEMA_0,
-            (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
+        dpctexp::codepin::gen_prolog_API_CP(
+            "vectorAdd:example.cu:24:9", 0,
+            "d_a", d_a, "d_result", d_result);
         vectorAdd<<<1, 4>>>(d_a, d_result);
 
         // Copy result from device to host
-        dpct::experimental::gen_epilog_API_CP(
-            "example.cu:38:3", 0, VAR_SCHEMA_0,
-            (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
+        dpctexp::codepin::gen_epilog_API_CP(
+            "vectorAdd:example.cu:24:9", 0,
+            "d_a", d_a, "d_result", d_result);
         cudaMemcpy(h_result, d_result, vectorSize * sizeof(int3),
                     cudaMemcpyDeviceToHost);
 
@@ -254,6 +254,9 @@ the following execution log files will be generated.
           [
               {
                   "ID": "example.cu:26:3:prolog",
+                  "Device Name": "GPU",
+                  "Device ID": "0",
+                  "Stream Address": "0xe4bb30",
                   "Free Device Memory": "16374562816",
                   "Total Device Memory": "16882663424",
                   "Elapse Time(ms)": "0",
@@ -288,6 +291,9 @@ the following execution log files will be generated.
           [
               {
                   "ID": "example.cu:26:3:prolog",
+                  "Device Name": "GPU",
+                  "Device ID": "0",
+                  "Stream Address": "0x3fea40",
                   "Free Device Memory": "0",
                   "Total Device Memory": "31023112192",
                   "Elapse Time(ms)": "0",
@@ -322,6 +328,9 @@ programs start to diverge from one another.
 Analyse the CodePin Result
 --------------------------
 
+CodePin Report
+~~~~~~~~~~~~~~
+
 codepin-report.py (also can be triggered by dpct/c2s --codepin-report) is a functionality of
 the compatibility tool that consumes the execution log files from both CUDA and SYCL code and performs auto analysis.
 codepin-report.py can identify the inconsistent data value and report the stats data of the execution.
@@ -349,12 +358,29 @@ Following is an example of the analysis report.
 .. code-block::
 
     CodePin Summary
-    Totally APIs count, 2
-    Consistently APIs count, 2
-    Most Time-consuming Kernel(CUDA), example.cu:26:3:epilog, time:8.2316
-    Most Time-consuming Kernel(SYCL), example.cu:26:3:epilog, time:10.2575
-    Peak Device Memory Used(CUDA), 508100608
-    Peak Device Memory Used(SYCL), 31023112192
+    Total API count, 2
+    Consistent API count, 0
+    Most Time-consuming Kernel(CUDA), vectorAdd:example.cu:24:5:epilog, time:16.8069
+    Most Time-consuming Kernel(SYCL), vectorAdd:example.cu:24:5:prolog, time:18.3240
+    Peak Device Memory Used(CUDA), 445644800
+    Peak Device Memory Used(SYCL), 540689534976
     CUDA Meta Data ID, SYCL Meta Data ID, Type, Detail
-    example.cu:26:3:prolog,example.cu:26:3:prolog,Data value,[WARNING: METADATA MISMATCH] The pair of prolog data example.cu:26:3:prolog are mismatched,
-    and the corresponding pair of epilog data matches. This mismatch may be caused by the initialized memory or argument used in the API example.cu.
+    vectorAdd:example.cu:24:5:epilog,vectorAdd:example.cu:24:5:epilog,Data value,The location of failed ID Errors occurred during comparison: d_a->"Data"->[3]->"Data"->[0]->"x"->"Data"->[0] and [ERROR: DATA VALUE MISMATCH] the CUDA value 1 differs from the SYCL value 26518016.; d_result->"Data"->[3]->"Data"->[0]->"x"->"Data"->[0] and [ERROR: DATA VALUE MISMATCH] the CUDA value 2 differs from the SYCL value 26518017.
+    vectorAdd:example.cu:24:5:prolog,vectorAdd:example.cu:24:5:prolog,Data value,[WARNING: METADATA MISMATCH] The pair of prolog data vectorAdd:example.cu:24:5:prolog are mismatched, and the corresponding pair of epilog data matches. This mismatch may be caused by the initialized memory or argument used in the API vectorAdd.
+
+Data Flow Graph  
+~~~~~~~~~~~~~~~
+
+codepin-report.py can generate a data   flow graph for 
+kernels with option ``--generate-data-flow-graph``.  The data flow graph presents visualizations of kernel execution and compares results between CUDA and SYCL, highlighting the execution mismatch between CUDA and SYCL code.
+In the data flow graph, each kernel execution and its input and output arguments are grouped into a layer, presenting a run status of the kernel execution. The value   of input and output arguments are tagged with version information in the form of “V<num>”. For example,   the initial version is tagged as V0, and once the value of the argument is updated, the version number will be increased. For a specific kernel execution, if there’s a mismatch between CUDA and SYCL results, the mismatched argument node will be colored red.
+
+.. figure:: DataFlowGraph.png
+   :alt: DataFlowGraph
+   :align: center
+
+The above picture shows the data flow graph of the vectorAdd example, which is constructed by a title and execution layer. The execution layer presents a kernel execution and its inputs and outputs. The kernel node shows that kernel ``vectorAdd`` is executed on the stream of the device named GPU0, and also shows the kernel's execution time and source location. All input arguments (``d_a`` node and top ``d_result`` node) are tagged with V0, indicating initial values. The output argument (bottom ``d_result`` node) is tagged with V1 because ``d_result`` is both input and output arguments, and its value changes in the kernel. 
+
+The nodes ``d_a:V0``,  ``d_result:V0``, and ``d_result:V1`` are colored red, indicating a value mismatch between the CUDA and SYCL runs. In this case, the result value mismatch is caused by the mismatch of the input argument values, and the mismatch between input argument values may be caused by the different behavior of memory initialization between CUDA and SYCL, as the report states.
+
+This data flow graph target provides a clear view of the execution process, making it easy to identify discrepancies and track variable changes across executions.
diff --git a/_sources/dev_guide/migration/migration-rules.rst b/_sources/dev_guide/migration/migration-rules.rst
@@ -56,14 +56,17 @@ Migration rules are specified in YAML files. A single rule file may contain mult
      - Required. Specifies the priority of the rule: ``Takeover`` > ``Default`` > ``Fallback``.
        When there are rule conflicts, the rule with higher priority will take precedence.
    * - Kind
-     - ``Macro`` | ``API`` | ``Header`` | ``Type`` | ``Class`` | ``Enum`` | ``DisableAPIMigration`` | ``PatternRewriter`` | ``CMakeRule``
+     - ``Macro`` | ``API`` | ``Header`` | ``Type`` | ``Class`` | ``Enum`` | ``DisableAPIMigration`` | ``PatternRewriter`` | ``CMakeRule`` | ``PythonRule``
      - Required. Specifies the rule type.
    * - CmakeSyntax
      - String value
      - Required. Specify the CMake syntax name that will be migrated. Use the unique name for the CMake syntax.
+   * - PythonSyntax
+     - String value
+     - Optional. Specify the Python syntax name that will be migrated. Use the unique name for the Python syntax.
    * - MatchMode
-     - ``Partial`` | ``Full``
-     - Required. Specify the match mode with full word match or partial word match. If not specified, partial match mode will be used.
+     - ``Partial`` | ``Full`` | ``StrictFull``
+     - Optional. Specify the match mode with full word match, strict full word match, or partial word match. If not specified, partial match mode will be used. The partial matching mode means the matched string can be surrounded by arbitrary characters including whitespace. The full matching mode means the matched string should not be surrounded by identifier characters (letters, numbers, and underscore). The strict matching mode means the matched string should only be surrounded by whitespace characters.
    * - In
      - String value
      - Required. Specifies the target name in the input source code.

diff --git a/_sources/dev_guide/migration/migration-workflow.rst b/_sources/dev_guide/migration/migration-workflow.rst
@@ -113,7 +113,7 @@ This signature will be used later for validation after migration.
 Enable CodePin with the ``–enable-codepin`` option.
 
 For detailed information about debugging using the CodePin tool, refer to
-`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-1/debug-with-codepin.html>`_.
+`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-2/debug-with-codepin.html>`_.
 
 Configure the Tool
 ******************
@@ -425,7 +425,7 @@ project signature will be logged during the execution time.
 The signature contains the data value of each execution checkpoint, which can be verified manually or with an auto-analysis tool.
 
 For detailed information about debugging using the CodePin tool, refer to
-`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-1/debug-with-codepin.html>`_.
+`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-2/debug-with-codepin.html>`_.
 
 Optimize Your Code
 ------------------
@@ -437,7 +437,7 @@ code to improve for optimizing your application performance.
 Additional hardware- or library-specific optimization information is available:
 
 * For detailed information about optimizing your code for Intel GPUs, refer to
-  the `oneAPI GPU Optimization Guide <https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/current/overview.html>`_.
+  the `oneAPI GPU Optimization Guide <https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/overview.html>`_.
 * For detailed information about optimizing your code for AMD GPUs, refer to the
   `Codeplay AMD GPU Performance Guide <https://developer.codeplay.com/products/oneapi/amd/2024.0.2/guides/performance/introduction>`_.
 * For detailed information about optimizing your code for NVIDIA GPUS, refer to

diff --git a/_sources/dev_guide/reference/command-line-options-ref/alpha-list.rst b/_sources/dev_guide/reference/command-line-options-ref/alpha-list.rst
@@ -318,6 +318,12 @@ The following table lists |tool_name| command line options in alphabetical order
      - .. include:: /_include_files/options_def.rst
           :start-after: desc-use-explicit-namespace:
           :end-before: end-use-explicit-namespace:
+   * - .. include:: /_include_files/options_def.rst
+          :start-after: opt-use-syclcompat:
+          :end-before: desc-use-syclcompat:
+     - .. include:: /_include_files/options_def.rst
+          :start-after: desc-use-syclcompat:
+          :end-before: end-use-syclcompat:
    * - .. include:: /_include_files/options_def.rst
           :start-after: opt-usm-level:
           :end-before: desc-usm-level:

diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1033.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1033.rst
@@ -27,7 +27,7 @@ Suggestions to Fix
 Set user-defined direction numbers to the basic Sobol generator and use it as
 Scrambled Sobol generator.
 
-See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/engines-basic-random-number-generators.html>`_ topic for more information.
+See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/engines-basic-random-number-generators.html>`_ topic for more information.
 
 For example, this original CUDA\* code:
 

diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1036.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1036.rst
@@ -28,4 +28,4 @@ Suggestions to Fix
 
 Rewrite this code manually by using a supported random number generator.
 
-See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/engines-basic-random-number-generators.html>`_ topic for more information.
+See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/engines-basic-random-number-generators.html>`_ topic for more information.
diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1045.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1045.rst
@@ -24,7 +24,7 @@ Suggestions to Fix
 If the matrix type in used is:
 
 * Supported by the routine: ignore this warning.
-* Not supported by the routine: manually fix the code according to `sparse-blas-routines <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/sparse-blas-routines.html>`_.
+* Not supported by the routine: manually fix the code according to `sparse-blas-routines <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/sparse-blas-routines.html>`_.
 
 For example, this original CUDA\* code:
 

diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1046.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1046.rst
@@ -30,7 +30,7 @@ Use a supported data type to rewrite the code.
 Suggestions to Fix
 ------------------
 
-Please refer to the `gemm topic <https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/blas-routines/blas-level-3-routines/gemm.html>`_
+Please refer to the `gemm topic <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/gemm.html>`_
 of the Intel® oneAPI Math Kernel Library (oneMKL) - Data Parallel C++ Developer
 Reference for supported data types to fix the code manually.
 

diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1071.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1071.rst
@@ -34,4 +34,4 @@ If the placement is incorrect, you may need to manually add the necessary ``set_
 statements before the call to ``commit()``.
 
 Refer to the
-`descriptor<precision, domain>::set_value function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/descriptor-precision-domain-set-value.html>`_ for more information.
+`descriptor<precision, domain>::set_value function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/descriptor-precision-domain-set-value.html>`_ for more information.
diff --git a/_sources/dev_guide/reference/diagnostic_ref/dpct1075.rst b/_sources/dev_guide/reference/diagnostic_ref/dpct1075.rst
@@ -71,4 +71,4 @@ correctly and are using the correct queue parameter. Fix the code if needed by
 adding missing commit calls and adjusting queue parameters.
 
 Refer to the
-`descriptor<precision, domain>::commit function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/descriptor-precision-domain-commit.html>`_ for more information.
+`descriptor<precision, domain>::commit function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/descriptor-precision-domain-commit.html>`_ for more information.
Original file line number	Diff line number	Diff line change
Expand Up		@@ -28,4 +28,4 @@ Suggestions to Fix

		Rewrite this code manually by using a supported random number generator.

		See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/engines-basic-random-number-generators.html>`_ topic for more information.
		See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/engines-basic-random-number-generators.html>`_ topic for more information.