Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added _images/DataFlowGraph.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
45 changes: 36 additions & 9 deletions _sources/dev_guide/frequently-asked-questions-faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Frequently Asked Questions

**General Information**

* `How do I migrate source files that use C++11 or newer standard features on Linux\* and Windows\*?`_
* `How do I migrate source files that use C++20 or newer standard features on Linux\* and Windows\*?`_
* `How do I migrate files on Windows when using a CMake project?`_
* `How is the migrated code formatted?`_
* `Why does the compilation database not contain all source files in the project?`_
Expand All @@ -27,24 +27,25 @@ Frequently Asked Questions
General Information
-------------------

How do I migrate source files that use C++11 or newer standard features on Linux\* and Windows\*?
How do I migrate source files that use C++20 or newer standard features on Linux\* and Windows\*?
*************************************************************************************************

On Linux, the default C++ standard for |tool_name|'s
parser is C++98, with some C++11 features
accepted. If you want to enable other C++11 or newer standard
features in |tool_name|, you need to add
parser is C++17. If you want to enable newer standard features
in |tool_name|, you need to add
the ``--extra-arg="-std=<value>"`` option to the
command line. The supported values are:

- ``c++11``
- ``c++14``
- ``c++17``
- ``c++20``
- ``c++23``
- ``c++26``

On Windows, the default C++ standard for |tool_name|'s
parser is C++14. If you want to enable C++17
features in |tool_name|, you need to add
the option ``--extra-arg="-std=c++17"`` to the command line.
parser is C++17. If you want to enable C20 features
in |tool_name|, you need to add
the option ``--extra-arg="-std=c++20"`` to the command line.

How do I migrate files on Windows when using a CMake project?
*************************************************************
Expand Down Expand Up @@ -331,6 +332,32 @@ Based on these language standards |tool_name| emits the parsing error.

You may need to adjust the source code.

How do I resolve migration failure with "fatal error: 'cmath' file not found" in Linux?
***************************************************************************************

The problem stems from an absent include path for the new standard C++ library.
The |tool_name| is designed to automatically detect the appropriate version of the C++ header file by checking the compiler package at ``/usr/lib/gcc/x86_64-linux-gnu``and C++ header at ``/usr/include/c++``.
In the following example, it tries to use C++ header version 12 based on the knowledge of the compiler package, but it fails because C++ header version 12 does not exist."

.. code-block::
:linenos:

ls /usr/lib/gcc/x86_64-linux-gnu
11 12
ls /usr/include/c++
11

To fix this issue, please install the version 12 g++ package or libstdc++ package.

.. code-block::
:linenos:

sudo apt install g++-12
or
sudo apt install libstdc++-12-dev

If your installation differs, install the missing version of ``g++-XX`` or ``libstdc++-XX-dev`` based on what you see missing from the results of "ls /usr/lib/gcc/x86_64-linux-gnu" and "ls /usr/include/c++".

How do I resolve incorrect runtime behavior for dpct::dev_mgr and dpct:mem_mgr in a library project that is loaded more than once in another application?
***********************************************************************************************************************************************************

Expand Down
66 changes: 46 additions & 20 deletions _sources/dev_guide/migration/debug-with-codepin.rst
Original file line number Diff line number Diff line change
Expand Up @@ -149,17 +149,17 @@ After migration, there will be two files: ``dpct_output_codepin_sycl/example.dp.
q_ct1.memcpy(d_a, h_a, vectorSize * 12);

// Launch the CUDA kernel
dpct::experimental::gen_prolog_API_CP(
"example.cu:38:3(SYCL)", &q_ct1,
VAR_SCHEMA_0, (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
dpctexp::codepin::gen_prolog_API_CP(
"vectorAdd:example.cu:24:9",
&q_ct1, "d_a", d_a, "d_result", d_result);
q_ct1.parallel_for(
sycl::nd_range<3>(sycl::range<3>(1, 1, 4), sycl::range<3>(1, 1, 4)),
[=](sycl::nd_item<3> item_ct1) { vectorAdd(d_a, d_result, item_ct1); });

// Copy result from device to host
dpct::experimental::gen_epilog_API_CP(
"example.cu:38:3(SYCL)", &q_ct1,
VAR_SCHEMA_0, (long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
dpctexp::codepin::gen_epilog_API_CP(
"vectorAdd:example.cu:24:9",
&q_ct1, "d_a", d_a, "d_result", d_result);

q_ct1.memcpy(h_result, d_result, vectorSize * sizeof(sycl::int3)).wait();

Expand Down Expand Up @@ -212,15 +212,15 @@ After migration, there will be two files: ``dpct_output_codepin_sycl/example.dp.
cudaMemcpy(d_a, h_a, vectorSize * 12, cudaMemcpyHostToDevice);

// Launch the CUDA kernel
dpct::experimental::gen_prolog_API_CP(
"example.cu:38:3", 0, VAR_SCHEMA_0,
(long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
dpctexp::codepin::gen_prolog_API_CP(
"vectorAdd:example.cu:24:9", 0,
"d_a", d_a, "d_result", d_result);
vectorAdd<<<1, 4>>>(d_a, d_result);

// Copy result from device to host
dpct::experimental::gen_epilog_API_CP(
"example.cu:38:3", 0, VAR_SCHEMA_0,
(long *)&d_a, VAR_SCHEMA_1, (long *)&d_result);
dpctexp::codepin::gen_epilog_API_CP(
"vectorAdd:example.cu:24:9", 0,
"d_a", d_a, "d_result", d_result);
cudaMemcpy(h_result, d_result, vectorSize * sizeof(int3),
cudaMemcpyDeviceToHost);

Expand Down Expand Up @@ -254,6 +254,9 @@ the following execution log files will be generated.
[
{
"ID": "example.cu:26:3:prolog",
"Device Name": "GPU",
"Device ID": "0",
"Stream Address": "0xe4bb30",
"Free Device Memory": "16374562816",
"Total Device Memory": "16882663424",
"Elapse Time(ms)": "0",
Expand Down Expand Up @@ -288,6 +291,9 @@ the following execution log files will be generated.
[
{
"ID": "example.cu:26:3:prolog",
"Device Name": "GPU",
"Device ID": "0",
"Stream Address": "0x3fea40",
"Free Device Memory": "0",
"Total Device Memory": "31023112192",
"Elapse Time(ms)": "0",
Expand Down Expand Up @@ -322,6 +328,9 @@ programs start to diverge from one another.
Analyse the CodePin Result
--------------------------

CodePin Report
~~~~~~~~~~~~~~

codepin-report.py (also can be triggered by dpct/c2s --codepin-report) is a functionality of
the compatibility tool that consumes the execution log files from both CUDA and SYCL code and performs auto analysis.
codepin-report.py can identify the inconsistent data value and report the stats data of the execution.
Expand Down Expand Up @@ -349,12 +358,29 @@ Following is an example of the analysis report.
.. code-block::

CodePin Summary
Totally APIs count, 2
Consistently APIs count, 2
Most Time-consuming Kernel(CUDA), example.cu:26:3:epilog, time:8.2316
Most Time-consuming Kernel(SYCL), example.cu:26:3:epilog, time:10.2575
Peak Device Memory Used(CUDA), 508100608
Peak Device Memory Used(SYCL), 31023112192
Total API count, 2
Consistent API count, 0
Most Time-consuming Kernel(CUDA), vectorAdd:example.cu:24:5:epilog, time:16.8069
Most Time-consuming Kernel(SYCL), vectorAdd:example.cu:24:5:prolog, time:18.3240
Peak Device Memory Used(CUDA), 445644800
Peak Device Memory Used(SYCL), 540689534976
CUDA Meta Data ID, SYCL Meta Data ID, Type, Detail
example.cu:26:3:prolog,example.cu:26:3:prolog,Data value,[WARNING: METADATA MISMATCH] The pair of prolog data example.cu:26:3:prolog are mismatched,
and the corresponding pair of epilog data matches. This mismatch may be caused by the initialized memory or argument used in the API example.cu.
vectorAdd:example.cu:24:5:epilog,vectorAdd:example.cu:24:5:epilog,Data value,The location of failed ID Errors occurred during comparison: d_a->"Data"->[3]->"Data"->[0]->"x"->"Data"->[0] and [ERROR: DATA VALUE MISMATCH] the CUDA value 1 differs from the SYCL value 26518016.; d_result->"Data"->[3]->"Data"->[0]->"x"->"Data"->[0] and [ERROR: DATA VALUE MISMATCH] the CUDA value 2 differs from the SYCL value 26518017.
vectorAdd:example.cu:24:5:prolog,vectorAdd:example.cu:24:5:prolog,Data value,[WARNING: METADATA MISMATCH] The pair of prolog data vectorAdd:example.cu:24:5:prolog are mismatched, and the corresponding pair of epilog data matches. This mismatch may be caused by the initialized memory or argument used in the API vectorAdd.

Data Flow Graph
~~~~~~~~~~~~~~~

codepin-report.py can generate a data flow graph for
kernels with option ``--generate-data-flow-graph``. The data flow graph presents visualizations of kernel execution and compares results between CUDA and SYCL, highlighting the execution mismatch between CUDA and SYCL code.
In the data flow graph, each kernel execution and its input and output arguments are grouped into a layer, presenting a run status of the kernel execution. The value of input and output arguments are tagged with version information in the form of “V<num>”. For example, the initial version is tagged as V0, and once the value of the argument is updated, the version number will be increased. For a specific kernel execution, if there’s a mismatch between CUDA and SYCL results, the mismatched argument node will be colored red.

.. figure:: DataFlowGraph.png
:alt: DataFlowGraph
:align: center

The above picture shows the data flow graph of the vectorAdd example, which is constructed by a title and execution layer. The execution layer presents a kernel execution and its inputs and outputs. The kernel node shows that kernel ``vectorAdd`` is executed on the stream of the device named GPU0, and also shows the kernel's execution time and source location. All input arguments (``d_a`` node and top ``d_result`` node) are tagged with V0, indicating initial values. The output argument (bottom ``d_result`` node) is tagged with V1 because ``d_result`` is both input and output arguments, and its value changes in the kernel.

The nodes ``d_a:V0``, ``d_result:V0``, and ``d_result:V1`` are colored red, indicating a value mismatch between the CUDA and SYCL runs. In this case, the result value mismatch is caused by the mismatch of the input argument values, and the mismatch between input argument values may be caused by the different behavior of memory initialization between CUDA and SYCL, as the report states.

This data flow graph target provides a clear view of the execution process, making it easy to identify discrepancies and track variable changes across executions.
9 changes: 6 additions & 3 deletions _sources/dev_guide/migration/migration-rules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,17 @@ Migration rules are specified in YAML files. A single rule file may contain mult
- Required. Specifies the priority of the rule: ``Takeover`` > ``Default`` > ``Fallback``.
When there are rule conflicts, the rule with higher priority will take precedence.
* - Kind
- ``Macro`` | ``API`` | ``Header`` | ``Type`` | ``Class`` | ``Enum`` | ``DisableAPIMigration`` | ``PatternRewriter`` | ``CMakeRule``
- ``Macro`` | ``API`` | ``Header`` | ``Type`` | ``Class`` | ``Enum`` | ``DisableAPIMigration`` | ``PatternRewriter`` | ``CMakeRule`` | ``PythonRule``
- Required. Specifies the rule type.
* - CmakeSyntax
- String value
- Required. Specify the CMake syntax name that will be migrated. Use the unique name for the CMake syntax.
* - PythonSyntax
- String value
- Optional. Specify the Python syntax name that will be migrated. Use the unique name for the Python syntax.
* - MatchMode
- ``Partial`` | ``Full``
- Required. Specify the match mode with full word match or partial word match. If not specified, partial match mode will be used.
- ``Partial`` | ``Full`` | ``StrictFull``
- Optional. Specify the match mode with full word match, strict full word match, or partial word match. If not specified, partial match mode will be used. The partial matching mode means the matched string can be surrounded by arbitrary characters including whitespace. The full matching mode means the matched string should not be surrounded by identifier characters (letters, numbers, and underscore). The strict matching mode means the matched string should only be surrounded by whitespace characters.
* - In
- String value
- Required. Specifies the target name in the input source code.
Expand Down
6 changes: 3 additions & 3 deletions _sources/dev_guide/migration/migration-workflow.rst
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ This signature will be used later for validation after migration.
Enable CodePin with the ``–enable-codepin`` option.

For detailed information about debugging using the CodePin tool, refer to
`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-1/debug-with-codepin.html>`_.
`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-2/debug-with-codepin.html>`_.

Configure the Tool
******************
Expand Down Expand Up @@ -425,7 +425,7 @@ project signature will be logged during the execution time.
The signature contains the data value of each execution checkpoint, which can be verified manually or with an auto-analysis tool.

For detailed information about debugging using the CodePin tool, refer to
`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-1/debug-with-codepin.html>`_.
`Debug Migrated Code Runtime Behavior <https://www.intel.com/content/www/us/en/docs/dpcpp-compatibility-tool/developer-guide-reference/2024-2/debug-with-codepin.html>`_.

Optimize Your Code
------------------
Expand All @@ -437,7 +437,7 @@ code to improve for optimizing your application performance.
Additional hardware- or library-specific optimization information is available:

* For detailed information about optimizing your code for Intel GPUs, refer to
the `oneAPI GPU Optimization Guide <https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/current/overview.html>`_.
the `oneAPI GPU Optimization Guide <https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2024-2/overview.html>`_.
* For detailed information about optimizing your code for AMD GPUs, refer to the
`Codeplay AMD GPU Performance Guide <https://developer.codeplay.com/products/oneapi/amd/2024.0.2/guides/performance/introduction>`_.
* For detailed information about optimizing your code for NVIDIA GPUS, refer to
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -318,6 +318,12 @@ The following table lists |tool_name| command line options in alphabetical order
- .. include:: /_include_files/options_def.rst
:start-after: desc-use-explicit-namespace:
:end-before: end-use-explicit-namespace:
* - .. include:: /_include_files/options_def.rst
:start-after: opt-use-syclcompat:
:end-before: desc-use-syclcompat:
- .. include:: /_include_files/options_def.rst
:start-after: desc-use-syclcompat:
:end-before: end-use-syclcompat:
* - .. include:: /_include_files/options_def.rst
:start-after: opt-usm-level:
:end-before: desc-usm-level:
Expand Down
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1033.rst
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@ Suggestions to Fix
Set user-defined direction numbers to the basic Sobol generator and use it as
Scrambled Sobol generator.

See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/engines-basic-random-number-generators.html>`_ topic for more information.
See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/engines-basic-random-number-generators.html>`_ topic for more information.

For example, this original CUDA\* code:

Expand Down
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1036.rst
Original file line number Diff line number Diff line change
Expand Up @@ -28,4 +28,4 @@ Suggestions to Fix

Rewrite this code manually by using a supported random number generator.

See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/engines-basic-random-number-generators.html>`_ topic for more information.
See the `Random Number Generators <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/engines-basic-random-number-generators.html>`_ topic for more information.
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1045.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ Suggestions to Fix
If the matrix type in used is:

* Supported by the routine: ignore this warning.
* Not supported by the routine: manually fix the code according to `sparse-blas-routines <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/sparse-blas-routines.html>`_.
* Not supported by the routine: manually fix the code according to `sparse-blas-routines <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/sparse-blas-routines.html>`_.

For example, this original CUDA\* code:

Expand Down
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1046.rst
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ Use a supported data type to rewrite the code.
Suggestions to Fix
------------------

Please refer to the `gemm topic <https://www.intel.com/content/www/us/en/develop/documentation/oneapi-mkl-dpcpp-developer-reference/top/blas-routines/blas-level-3-routines/gemm.html>`_
Please refer to the `gemm topic <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/gemm.html>`_
of the Intel® oneAPI Math Kernel Library (oneMKL) - Data Parallel C++ Developer
Reference for supported data types to fix the code manually.

Expand Down
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1071.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,4 +34,4 @@ If the placement is incorrect, you may need to manually add the necessary ``set_
statements before the call to ``commit()``.

Refer to the
`descriptor<precision, domain>::set_value function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/descriptor-precision-domain-set-value.html>`_ for more information.
`descriptor<precision, domain>::set_value function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/descriptor-precision-domain-set-value.html>`_ for more information.
2 changes: 1 addition & 1 deletion _sources/dev_guide/reference/diagnostic_ref/dpct1075.rst
Original file line number Diff line number Diff line change
Expand Up @@ -71,4 +71,4 @@ correctly and are using the correct queue parameter. Fix the code if needed by
adding missing commit calls and adjusting queue parameters.

Refer to the
`descriptor<precision, domain>::commit function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/current/descriptor-precision-domain-commit.html>`_ for more information.
`descriptor<precision, domain>::commit function <https://www.intel.com/content/www/us/en/docs/onemkl/developer-reference-dpcpp/2024-2/descriptor-precision-domain-commit.html>`_ for more information.
Loading