Skip to content
Permalink
Browse files

Update docs

  • Loading branch information...
franz committed Nov 29, 2017
1 parent 980dd29 commit abd27df9a2a683827ca13998f37a13997e158a15
@@ -2,9 +2,9 @@ This file contains notes for making distribution packages of pocl.

ICD
---
Pocl should probably be built with ICD enabled for desktop
distributions. Pocl does not have an ICD loader, so a dependancy
on one would be beneficial.
Pocl should probably be built with ICD enabled (``-DENABLE_ICD=ON`` CMake
option) for desktop distributions. Pocl does not have an ICD loader,
so a dependancy on one would be beneficial.

CMake options for a distribution build
--------------------------------------
@@ -25,12 +25,6 @@ CMake options for a distribution build
files in build / source directories) will be ignored
and pocl will always look in installed paths only.

Target(host) CPU
----------------
If not overridden, pocl uses LLVM (llc) to detect the host CPU, which
is used to determine processor ISA extensions (like SSE and AVX).
See: http://portablecl.org/docs/html/env_variables.html

Mesa (OpenGL) interoperability
------------------------------
On some current (Jan 2014) Linux distibutions, mesa is built with LLVMpipe.
@@ -4,8 +4,10 @@ Information for Developers
Using cmake to build & install pocl
-----------------------------------

Most of the important stuff on using cmake is in the INSTALL file. A few
additional items:
Most of the important stuff on using cmake is in the install document,
see :ref:pocl-install

A few additional items:

The

@@ -14,15 +16,18 @@ The
command must point to ocl-vendors in the cmake *build* directory, not the
pocl source directory.

Testing is done using either "make test" or invoking "ctest" directly;
"make check" does not work. Invoke ctest with -jX option to run X tests
in parallel.
You can run the tests or built examples using "ctest" directly;
``ctest --print-labels`` prints the available labels (testsuites);
Invoke ctest with -jX option to run X tests in parallel.

"make check" will invoke ctest with tier-1 testsuites.
See :ref:`maintenance-policy` for details.

Testsuite
----------

Before changes are committed to the mainline, all tests in the 'make
check' suite should pass::
check' tier-1 suite should pass::

make check

@@ -140,10 +145,8 @@ By default, pocl build system compiles the kernel libraries for
the host CPU architecture, to be used by 'basic' and 'pthread' devices.

LLVM is used to detect the CPU variant to be used as target. This
can be overridden by passing LLC_HOST_CPU to './configure'.
Valid options are best documented in the output of::

llvm-as /dev/null | llc -mcpu=help
can be overridden by passing -DLLC_HOST_CPU=... to CMake. See the
documentation for LLC_HOST_CPU build option.

Cross-compilation where 'build' is different from 'host' has not been
tested.
@@ -6,10 +6,11 @@ listed below. The variables are helpful both when using and when developing
pocl.

- **POCL_AFFINITY**
Linux-only. If set to 1, each thread of the pthread CPU driver sets its
affinity to its index. This may be useful with very long running kernels,
or when using subdevices (lets any idle cores enter deeper sleep). Defaults
to 0 (most people don't need this)
Linux-only, specific to pthread driver. If set to 1, each thread of
the pthread CPU driver sets its affinity to its index. This may be
useful with very long running kernels, or when using subdevices
(lets any idle cores enter deeper sleep). Defaults to 0 (most
people don't need this).

- **POCL_BUILDING**

@@ -34,8 +35,8 @@ pocl.

The old way (setting POCL_DEBUG to 1) has been updated to support categories.
Using this limits the amount of debug messages produced. Current options are:
general,memory,llvm,events,cache,locking,refcounts,timing,hsa,tce,all.
Note: setting POCL_DEBUG to 1 still works.
error,warning,general,memory,llvm,events,cache,locking,refcounts,timing,hsa,tce,all.
Note: setting POCL_DEBUG to 1 still works and equals error+warning+general.

- **POCL_DEBUG_LLVM_PASSES**

@@ -18,8 +18,7 @@ Frontend/Clang
* pipes (WIP)
* device-side enqueue

* cl_khr_f16: half precision float literals

* cl_khr_f16: half precision support (with the exception of vload_half / vstore_half)

Unimplemented host side functions
---------------------------------
@@ -1,3 +1,5 @@
.. _pocl-install:

============
Installation
============
@@ -9,21 +11,28 @@ In order to build pocl, you need the following support libraries and
tools:

* Latest released version of LLVM & Clang
* GNU make
* GNU make or ninja
* libtool dlopen wrapper files (e.g. libltdl3-dev in Debian)
* pthread (should be installed by default)
* hwloc v1.0 or newer (e.g. libhwloc-dev)
* pkg-config
* cmake


There are Dockerfiles available for a few most common linux
distributions in ``tools/docker``, looking into them might be helpful.

Clang / LLVM Notes
------------------

**IMPORTANT NOTE!** Some platforms (TCE and possibly HSA) require that
**IMPORTANT NOTE!** Some targets (TCE and possibly HSA) require that
you compile & build LLVM with RTTI on. It can be enabled on cmake command
line, as follows:

**Supported versions**
cmake .... -DLLVM_ENABLE_RTTI=ON -DLLVM_ENABLE_EH=ON ....

Supported LLVM versions
~~~~~~~~~~~~~~~~~~~~~~~~~

Note that pocl aims to support **the latest LLVM version** at the time
of pocl release, **plus the previous** LLVM version. All older LLVM
@@ -47,38 +56,117 @@ The build+install is the usual CMake way::
To see the default detected values, run ``cmake ..`` without any options,
it will produce a summary.

CMake variables
===============

Since pocl is a compiler, it both compiles (producing code) and is
compiled (it consists of code). This distinction typically called
"host" and "target": The host is where pocl is running, the target is
where the OpenCL code will be running. These two systems can be wildly
different.

Host compiler used to compile pocl can be GCC or Clang; the target
compiler is always Clang+LLVM since pocl uses Clang/LLVM internally.
For host compiler, you should use the one which your LLVM was compiled
with (because the LLVM-related parts of pocl take LLVM's CXXFLAGS from
llvm-config and pass them to the host compiler).

CMake host flags
----------------

Compile C:
CMAKE_C_FLAGS
CMAKE_C_FLAGS_<build-type>

Compile C++:
CMAKE_CXX_FLAGS
CMAKE_CXX_FLAGS_<build-type>

TODO
HOST_LLC_FLAGS

Convert assembler to object file:
HOST_CLANG_FLAGS

Post-process object file:
HOST_LD_FLAGS

Building kernels and the kernel library, i.e. target flags
------------------------------------------------------------

EXTRA_KERNEL_FLAGS
is applied to all kernel library compilation commands, IOW it's for
language-independent options

EXTRA_KERNEL_{C,CL,CXX}_FLAGS
cmake variables for per-language options for kernel library compilation


CMake: important options & features
CMake: other options & features
-------------------------------------

For multiple-item options, use ";" as separator (you'll have to escape it for bash).
Note that there are a few more packaging-related options described
in ``README.packaging``.

For multiple-item options like KERNELLIB_HOST_CPU_VARIANTS,
use ";" as separator (you'll have to escape it for bash).

- ``-DWITH_LLVM_CONFIG=<path-to-llvm-config>``
**IMPORTANT** Path to a llvm-config binary.
This determines the LLVM installation used by pocl.
If not specified, pocl will try to find and link against
llvm-config in PATH env var (usually means your system LLVM).

- ``-DSTATIC_LLVM`` enable this to link LLVM statically into pocl.
Note that you need LLVM built with static libs. This option might result
in much longer build/link times and much larger pocl library, but the
resulting libpocl will not require an LLVM installation to run.

- ``-DENABLE_ICD`` By default pocl's buildsystem will try to find an ICD
and build pocl as a dynamic library named "libpocl". This option is useful
if you want to avoid ICD and build pocl directly as libOpenCL library.
See also :ref:`linking-with-icd`

- ``-DPOCL_INSTALL_<something>_DIR`` The equivalent of ``--bindir``,
``--sbindir`` etc fine-tuning of paths for autotools. See the beginning
of toplevel CMakeLists.txt for all the variables.
- ``-DKERNELLIB_HOST_CPU_VARIANTS`` You can control which CPUs the
kernel library will be built for. Defaults to "native" which will be
converted to the build machine's CPU at buildtime. Available CPUs are
listed by ``llc -mcpu=help``; you can specify multiple CPUs, and pocl will
look for a kernel library for the runtime-detected CPU.

For x86(64) there is another possibility, ``distro``, which builds a few
preselected sse/avx variants covering 99.99% of x86 processors, and pocl
will use the most appropriate one at runtime, based on detected CPU features.
With ``distro``, the minimum requirement on CPU is SSE2.
Note that if ``CMAKE_INSTALL_PREFIX`` equals ``/usr`` then pocl.icd is
installed to ``/etc/OpenCL/vendors``, otherwise it's installed to
``${CMAKE_INSTALL_PREFIX}/etc/OpenCL/vendors``.

- ``-DLLC_HOST_CPU=<something>``
Defaults to auto-detection via ``llc``. Run ``llc -mcpu=help``
for valid values. The CPU type is required to compile
the "target" (kernel library) part of CPU backend.

This variable overrides LLVM's autodetected host CPU at configure time.
Useful when llc fails to detect the CPU (often happens on non-x86
platforms, or x86 with CPU newer than LLVM).

Note that when this is set (set by default) and the
KERNELLIB_HOST_CPU_VARIANTS variable is not ``distro``,
pocl will first try to find compiled kernel library
for runtime-detected CPU then fallback to LLC_HOST_CPU.
This works well if pocl is run where it was built,
or the actual CPU is in the KERNELLIB_HOST_CPU_VARIANTS list,
or the actual CPU is >= LLC_HOST_CPU feature-wise;
otherwise it will likely fail with illegal instruction at runtime.

- ``-DKERNELLIB_HOST_CPU_VARIANTS`` You can control which CPUs the
"target" part of CPU backend will be built for.
Unlike LLC_HOST_CPU, this variable is useful if you plan
to build for multiple CPUs. Defaults to "native" which is
automagically replaced by LLC_HOST_CPU.
Available CPUs are listed by ``llc -mcpu=help``. See above for
runtime CPU detection rules.

Note that there's another valid value on x86(64) platforms.
If set to ``distro``, the KERNELLIB_HOST_CPU_VARIANTS variable will be
set up with a few preselected sse/avx variants covering 99.99% of x86
processors, and the runtime CPU detection is slightly altered: pocl
will find the suitable compiled library based on detected CPU features,
so it cannot fail (at worst it'll degrade to SSE2 library).

- ``-DENABLE_TESTSUITES`` Which external (source outside pocl) testsuites to enable.
For the list of testsuites, see examples/CMakeLists.txt or the ``examples``
@@ -96,20 +184,26 @@ For multiple-item options, use ";" as separator (you'll have to escape it for ba
Builds Pocl as a fully conformant OpenCL implementation. Defaults to ON.
See :ref:`pocl-conformance` for details.

- ``-DENABLE_{A,L,T,UB}SAN`` - compiles pocl's host code (and tests
+ examples) with various sanitizers. Using more than one sanitizer at
a time is untested. Using together with ``-DENABLE_ICD=OFF`` is highly
recommended to avoid issues with loading order of sanitizer libraries.

- ``-DENABLE_{CUDA,TCE,HSA}=ON/OFF`` - enable various (non-CPU) backends.
Usually requires some extra setup; see their documentation.

- ``-DPOCL_DEBUG_MESSAGES=ON`` - when disabled, pocl is compiled without
debug messages (POCL_DEBUG env var) support.

- ``-DEXAMPLES_USE_GIT_MASTER=ON`` - when enabled, examples (external
programs in ``examples/`` directory) are built from their git branches
(if available), as opposed to default: building from release tars.

LLVM-less build
---------------
See :ref:`pocl-without-llvm`


Building on Ubuntu 16.04 LTS
----------------------------

The Clang/LLVM 3.8 shipped with Ubuntu 16.04 should work with pocl.
Be sure to install also the 'libclang-3.8-dev' package in addition
to the 'clang-3.8 and llvm-3.8-dev' packages, otherwise cmake will
fail.

Known build-time issues
-----------------------

@@ -2,24 +2,17 @@ Kernel compiler
---------------

The compilation of kernels in pocl is performed roughly as follows.
In release 0.9 the scripts (referred to below) were replaced by direct
LLVM API calls. The structure remains, e.g. calling script ``pocl-build`` was
replaced with calling function ``call_pocl_build()``. See ``lib/CL/pocl_llvm_api.cc``

#. Produce an LLVM bitcode of the single kernel function.
#. Produce an LLVM bitcode of the entire program.

The kernel compiler of pocl relies on the OpenCL C frontend of the Clang
for parsing the kernel descriptions to LLVM bytecode. The output from
Clang is a description of the kernel function for a single work-item.
This is done using 'preprocess' and 'emit-llvm' Clang actions. This
happens at clBuildProgram() time.

Done with the help of ``pocl-build`` script that invokes the Clang. See
``clBuildProgram.c``.

#. Link in the built-in functions.
#. Link in the built-in kernel library functions.

The OpenCL C builtin functions are precompiled to LLVM *bitcode* libraries
residing under ``lib/kernel/$TARGET``. These are linked to the kernel using
the ``llvm-link`` tool when the helper script ``pocl-workgroup`` (see the next item).
link() from lib/llvmopencl/linker.cpp. This too happens in clBuildProgram()

#. Produce the work-group function.

@@ -32,19 +25,21 @@ replaced with calling function ``call_pocl_build()``. See ``lib/CL/pocl_llvm_api
description and take care of the parallel execution of multiple kernel instances
using their scheduling hardware.

This part is performed when a kernel execution command is executed (see
``clEnqueueNDRangeKernel.c``). Only at this point the work-group dimensions are
known, after which it is possible to produce functions of the single kernel functions
that execute the whole work-group.
This part is performed by target-specific code when a kernel execution
command is scheduled. Only at this point the work-group dimensions are
known, after which it is possible to produce functions of the single
kernel functions that execute the whole work-group.

#. Code generation for the target.

The work-group function (which is still in LLVM IR) of the kernel along with the launcher
functions are finally converted to the machine code of the target device. This is done in
the device layer's implementation of the kernel run command. For example, see ``llvm_codegen()``
in ``lib/CL/devices/common.c``. This function generates a dynamically loaded object of the
work-group function for actually launching the kernel. The function is called from the CPU
device layer implementations (``pocl_basic_run()`` of ``lib/CL/devices/basic/basic.c``).
the device layer's implementation of the kernel run command (same as generating wg
function). For example, see ``llvm_codegen()`` in ``lib/CL/devices/common.c``.
This function generates a dynamically loaded object of the work-group
function for actually launching the kernel. The function is called
from the CPU device layer implementations
(``pocl_basic_run()`` of ``lib/CL/devices/basic/basic.c``).


Work group function generation
@@ -72,6 +72,5 @@ Advantages:
* allocation of queues/events/command objects can be a lot faster

Disadvantages:
* memory allocated for those objects is never free()d;
it's only returned to allocation pool
* memory allocated for those objects is never free()d; it's only returned to allocation pool
* debugging tools will not detect use-after-free bugs on said objects

0 comments on commit abd27df

Please sign in to comment.
You can’t perform that action at this time.