Skip to content
Permalink
Branch: master
Commits on Nov 16, 2019
  1. clGetPlatformIDs return value fix

    pjaaskel committed Nov 16, 2019
    It should return CL_SUCCESS in case num_platforms == NULL && num_entries
    == 0.  At least Glow checks for availability of OpenCL (in general)
    using these parameters.
    
    Specs say:
    "If platforms is not NULL, the num_entries must be greater than zero."
Commits on Oct 19, 2019
  1. Add a minimally intrusive and easy-to-use kernel execution time profiler

    pjaaskel committed Aug 20, 2019
    Setting POCL_TRACING=cq collects kernel execution times by force
    enabling the command queue profiling feature, and dumps collected stats
    atexit(). The purpose of this feature is to enable implementation of
    minimally intrusive profile collection; the profile data collector can
    choose the occasions when it gathers the time stamp data from the events.
    The impact to the observed execution profile is minimized by avoiding writing
    any logs, copying objects or such while collecting the data during
    execution.
    
    It relies on the standard event timestamps to enable devices update them
    as (and when) they see fit during the execution.
    
    The drawback is accumulation of cl_object garbage, which should be taken
    in account in the data collection interval; the collector should release the
    events and the extra data objects they hold often enough to avoid
    memory consumption to become a problem.
    
    The current version does not perform garbage collection, but assumes
    the alive OpenCL objects that are kept until the exit is a non-problem,
    which is clearly the case with most of the OpenCL programs which are rather
    simple; not long running, nor launch a lot of commands over their lifetime.
    
    The default profile data collector counts only kernel commands at the moment.
    Collecting stats of data transfers would be a useful addition.
Commits on Oct 18, 2019
  1. Misc. cleanups and documentation

    pjaaskel committed Dec 1, 2018
    Except for the atomic decrement of cl_context_count instead of non-
    atomic.
  2. Add pytorch/Glow to the test suite

    pjaaskel committed Oct 14, 2019
    It (still) requires a noasserts LLVM build, thus not ready
    to be a tier1 test just yet.
Commits on Oct 16, 2019
Commits on Oct 15, 2019
  1. LLVM 7.0 fix

    pjaaskel committed Oct 15, 2019
Commits on Oct 14, 2019
Commits on Oct 12, 2019
  1. [hsa] memory leak fixes

    pjaaskel committed Aug 31, 2019
  2. [hsa] Do not worry about "system memory" with base profile

    pjaaskel committed Aug 14, 2019
    Only full profile needs to concern other allocations from the system
    memory. In base profile, each device have their own global space
    from which the mem objects are allocated.
  3. format-branch

    pjaaskel committed Oct 12, 2019
  4. Do not enable vectorization for SPMD devices

    pjaaskel committed Jul 13, 2019
    If they desire intra-WI vectorization, they can
    launch it in their target passes. This can have
    dramatic impact to WG IR compilation time.
  5. Also reset num_buffers to zero

    pjaaskel committed Sep 26, 2019
    Avoids segfault if the freeing is invoked multiple times for
    a reason or another.
  6. Do not run scalarizer

    pjaaskel committed Sep 21, 2019
    It is unclear if this is anymore beneficial with the vectorizers in
    the latest LLVM versions. The logic should be integrated to the
    loop vectorizer which should selectively scalarize vector datatypes
    and leave them intact in case it cannot produce better vectorization
    across the loop iterations.
  7. Avoid (re)optimization of printf

    pjaaskel committed Jul 12, 2019
    As printf is optimized during builtin library generation, it just
    slows down each kernel's compilation which calls printf. Actually,
    we generally are not interested in printf's performance since it's
    typically used on debugging mode or on non-performance critical
    parts.
  8. Remove invalid setPreservesAll()s.

    pjaaskel committed Jul 12, 2019
    They seem illegal since we modify the functions.
  9. Do not call instcombine explicitly anymore

    pjaaskel committed Jul 12, 2019
    The extra calls seem to not be needed anymore with current LLVM versions
    for good quality results, they just slow down the WG function IR generation.
  10. matrix1: fixes to the test case

    pjaaskel committed Jan 9, 2019
    Fix the case when max local size is larger than global. Also fix
    a div by zero due to an illegal assertion. The div by zero got
    triggered if the local wg is larger than matrix size. It just
    gets silenced by the FPE handler which is installed in case any
    of the CPU devices is built in.
Commits on Aug 20, 2019
Commits on Aug 6, 2019
  1. Added a missing clang lib

    pjaaskel committed Aug 6, 2019
Commits on Jul 15, 2019
  1. Add a news item about pocl-accel

    pjaaskel committed Jul 15, 2019
    Also fix a broken link in the accel dox.
Commits on Jul 11, 2019
  1. Gave Andrew some CREDITS.

    pjaaskel committed Jul 11, 2019
Commits on May 30, 2019
  1. Fix HSA build.

    pjaaskel committed May 30, 2019
  2. Force Python 3 for PyOpenCL.

    pjaaskel committed May 30, 2019
    This avoids headaches with systems where Python 3 (or actually py.test
    for Python 3) is not the default. Ubuntu 16.04 LTS at least failed to
    run some PyOpenCL tests because of this.
Commits on May 27, 2019
  1. Update CHANGES.

    pjaaskel committed May 25, 2019
  2. Clang format'd

    pjaaskel committed May 27, 2019
  3. Work-group function specialization rework.

    pjaaskel committed May 27, 2019
    * Allow specializing WG also for "small grid sizes". This is a limit
    defined by device (e.g. it might be beneficial to specialize for < 16b
    indices to induce better vectorization).
    * Explicit specialize-parameters to build functions.
    * Add Range MD to the ID/size queries to help optimizers.
Older
You can’t perform that action at this time.