@franz franz released this Sep 25, 2018

Assets 2

pocl 1.2

Highlights

  • LLVM 7.0 support
  • HWLOC 2.0 support
Pre-release

@franz franz released this Aug 15, 2018 · 8 commits to release_1_2 since this release

Assets 2

pocl 1.2 Release Candidate 2

Few fixes included since RC1.

Pre-release

@franz franz released this Aug 6, 2018 · 21 commits to release_1_2 since this release

Assets 2

pocl 1.2 Release Candidate 1

Highlights

  • LLVM 7.0 support
  • HWLOC 2.0 support

@franz franz released this Mar 9, 2018

Assets 2

pocl 1.1

Highlights

  • LLVM 6.0 is now supported.

  • Reintroduced experimental SPIR LLVM bitcode support to pocl.
    Requires LLVM 5 or newer. New experimental feature: SPIR-V support;
    requires a working llvm-spirv converter. Currently only loading
    of SPIR-V binaries by pocl is supported, not output.
    See docs/features.rst for more details.

  • Refactored pocl cache now does away with LLVM file locks and relies
    entirely on system calls for proper synchronization. Additionally,
    cache file writes are now fdatasync()ed.

  • Improved kernel compilation time (with cold cache). Improvement
    depends on sources - it's bigger for large programs with many kernels.
    Luxmark now compiles in seconds instead of dozens of seconds;
    internal pocl tests run in 30-50% less time.

  • LLVM Scalarizer pass is now only called for SPMD devices. Performance
    change varies across tests, but positive seems to outweigh negative.

  • Implemented uninitialization callback for device drivers. This is
    triggered when the last cl_context is released. Currently only the
    CPU driver implements the callback.

  • Removed libpoclu from installed files; this library contains helpers
    for pocl's internal tests, and from installed files was only used by
    poclcc, which has been updated to not rely on it.

  • POCL_MAX_WORK_GROUP_SIZE is now respected by all devices. This variable
    limits the reported maximum WG sizes & dimensions; tuning max WG size
    may improve performance due to cache locality improvement.

  • CL_PLATFORM_VERSION now contains much more information about how
    pocl was built.

  • For users still building with Vecmathlib, performance should be back
    to levels of pocl 0.14 (there was a huge drop caused by a change
    in -O0 optimization level of LLVM 5.0).

  • Improved support for ARM and ARM64 architectures. All internal tests
    now pass (on Cortex-A53 and Cortex-A15), although it's still far
    from full conformance.

Pre-release

@franz franz released this Mar 5, 2018 · 3 commits to release_1_1 since this release

Assets 2

Changes since RC2:

  • updated Documentation
  • user supplied LLC_HOST_CPU is now properly enforced
  • updated Dockerfiles
  • several bugs fixed
Pre-release

@franz franz released this Feb 26, 2018 · 20 commits to release_1_1 since this release

Assets 2

Changes since RC1:

  • updated Documentation
  • CMake fixes & cleanups
  • applied several patches from Debian pocl packagers
  • several bugs fixed
  • changed the amount of reported local memory size to something more reasonable
Pre-release

@franz franz released this Feb 22, 2018 · 35 commits to release_1_1 since this release

Assets 2

pocl 1.1 Release Candidate 1

Copypasted highlights from CHANGES:

Highlights

  • LLVM 6.0 is now supported

  • Reintroduced experimental SPIR LLVM bitcode support to pocl.
    Requires LLVM 5 or newer.

  • Refactored pocl cache now does away with locking and relies
    entirely on system calls for proper synchronization. Additionally,
    cache file writes are now fdatasync()ed.

  • Improved kernel compilation time (with cold cache). Improvement
    depends on sources - it's bigger for large programs with many kernels.
    Luxmark now compiles in seconds instead of dozens of seconds;
    internal pocl tests run in 30-50% less time.

  • LLVM Scalarizer pass is now only called for SPMD devices. Performance
    change varies across tests, but positive seems to outweigh negative.

  • Implemented uninitialization callback for device drivers. This is
    triggered when the last cl_context is released. Currently only the
    CPU driver implements the callback.

  • removed libpoclu from installed files; this library contains helpers
    for pocl's internal tests, and from installed files was only used by
    poclcc, which has been updated to not rely on it.

  • POCL_MAX_WORK_GROUP_SIZE is now respected by all devices. This variable
    limits the reported maximum WG sizes & dimensions; tuning max WG size
    may improve performance due to cache locality improvement.

  • CL_PLATFORM_VERSION now contains much more information about how
    pocl was built.

  • for users still building with Vecmathlib, performance should be back
    to levels of pocl 0.14 (there was a huge drop caused by a change
    in -O0 optimization level of LLVM 5.0)

@franz franz released this Dec 19, 2017

Assets 2

pocl 1.0

Highlights

  • Improved automatic local work-group sizing on kernel enqueue, taking
    into account standard constraints, SIMD width for vectorization as
    well as the number of compute units available on the device.
  • Support for NVIDIA GPUs via a new CUDA backend (currently experimental).
  • Removed support for BBVectorizer.
  • LLVM 5.0 is now supported.
  • A few build options have been added for distribution builds,
    see README.packaging.
  • Somewhat improved scalability in the CPU driver. CPUs with many cores
    and programs using a lot of WIs with small kernels can run somewhat faster.
  • Full conformance with OpenCL 1.2 standard, enabled by default. There are
    some caveats though - see the documentation.
  • When conformance is enabled, some kernel library functions might be
    slower than in previous releases.
  • Pocl now reports OpenCL 1.2 instead of 2.0, except HSA enabled builds.
  • Updated format of pocl binaries, which is NOT backwards compatible.
    You'll need to clean any kernel caches.
  • Fixed several memory leaks.
  • Unresolved symbols (missing/misspelled functions etc) in a kernel will
    result in error in clBuildProgram() instead of pocl silently ignoring
    them and then aborting at dlopen().
  • New env variable POCL_MEMORY_LIMIT=N limits the Global memory size
    reported by pocl to N gigabytes.
  • New env variable POCL_AFFINITY (defaults to 0): if enabled, sets
    the affinity of each CPU driver pthread to a single core.
  • Improved AVX512 support (with LLVM 5.0). Note that even with LLVM 5.0
    there are still a few bugs (see pocl issue #555); AVX512 + LLVM 4.0 are
    a lot more broken, and probably not worth trying.
  • POCL_DEBUG env var has been revamped. You can now limit debuginfo to
    these categories (or their combination): all,error,warning,general
    memory,llvm,events,cache,locking,refcounts,timing,hsa,tce,cuda
    The old setting POCL_DEBUG=1 now equals error+warning+general.
Pre-release

@franz franz released this Dec 18, 2017 · 4 commits to release_1_0 since this release

Assets 2

updated documentation & fixed few bugs since RC1

Pre-release

@franz franz released this Dec 6, 2017 · 13 commits to release_1_0 since this release

Assets 2

pocl 1.0 Release Candidate 1

Copypasted highlights from CHANGES:

Highlights

  • Improved automatic local work-group sizing on kernel enqueue, taking
    into account standard constraints, SIMD width for vectorization as
    well as the number of compute units available on the device.
  • Support for NVIDIA GPUs via a new CUDA backend (currently experimental).
  • Removed support for BBVectorizer.
  • LLVM 5.0 is now supported.
  • A few build options have been added for distribution builds,
    see README.packaging.
  • Somewhat improved scalability in the CPU driver. CPUs with many cores
    and programs using a lot of WIs with small kernels can run somewhat faster.
  • Full conformance with OpenCL 1.2 standard, enabled by default. There are
    some caveats though - see the documentation.
  • When conformance is enabled, some kernel library functions might be
    slower than in previous releases.
  • Pocl now reports OpenCL 1.2 instead of 2.0, except HSA enabled builds.
  • Updated format of pocl binaries, which is NOT backwards compatible.
    You'll need to clean any kernel caches.
  • Fixed several memory leaks.
  • Unresolved symbols (missing/misspelled functions etc) in a kernel will
    result in error in clBuildProgram() instead of pocl silently ignoring
    them and then aborting at dlopen().
  • New env variable POCL_MEMORY_LIMIT=N limits the Global memory size
    reported by pocl to N gigabytes.
  • New env variable POCL_AFFINITY (defaults to 0): if enabled, sets
    the affinity of each CPU driver pthread to a single core.
  • Improved AVX512 support (with LLVM 5.0). Note that even with LLVM 5.0
    there are still a few bugs (see pocl issue #555); AVX512 + LLVM 4.0 are
    a lot more broken, and probably not worth trying.
  • POCL_DEBUG env var has been revamped. You can now limit debuginfo to
    these categories (or their combination): all,error,warning,general
    memory,llvm,events,cache,locking,refcounts,timing,hsa,tce,cuda
    The old setting POCL_DEBUG=1 now equals error+warning+general.