Skip to content
Permalink
Branch: master
Commits on Oct 19, 2019
  1. Add a minimally intrusive and easy-to-use kernel execution time profiler

    pjaaskel committed Aug 20, 2019
    Setting POCL_TRACING=cq collects kernel execution times by force
    enabling the command queue profiling feature, and dumps collected stats
    atexit(). The purpose of this feature is to enable implementation of
    minimally intrusive profile collection; the profile data collector can
    choose the occasions when it gathers the time stamp data from the events.
    The impact to the observed execution profile is minimized by avoiding writing
    any logs, copying objects or such while collecting the data during
    execution.
    
    It relies on the standard event timestamps to enable devices update them
    as (and when) they see fit during the execution.
    
    The drawback is accumulation of cl_object garbage, which should be taken
    in account in the data collection interval; the collector should release the
    events and the extra data objects they hold often enough to avoid
    memory consumption to become a problem.
    
    The current version does not perform garbage collection, but assumes
    the alive OpenCL objects that are kept until the exit is a non-problem,
    which is clearly the case with most of the OpenCL programs which are rather
    simple; not long running, nor launch a lot of commands over their lifetime.
    
    The default profile data collector counts only kernel commands at the moment.
    Collecting stats of data transfers would be a useful addition.
Commits on Oct 18, 2019
  1. Misc. cleanups and documentation

    pjaaskel committed Dec 1, 2018
    Except for the atomic decrement of cl_context_count instead of non-
    atomic.
  2. Add pytorch/Glow to the test suite

    pjaaskel committed Oct 14, 2019
    It (still) requires a noasserts LLVM build, thus not ready
    to be a tier1 test just yet.
Commits on Oct 16, 2019
Commits on Oct 15, 2019
  1. LLVM 7.0 fix

    pjaaskel committed Oct 15, 2019
Commits on Oct 14, 2019
Commits on Oct 12, 2019
  1. [hsa] memory leak fixes

    pjaaskel committed Aug 31, 2019
  2. [hsa] Do not worry about "system memory" with base profile

    pjaaskel committed Aug 14, 2019
    Only full profile needs to concern other allocations from the system
    memory. In base profile, each device have their own global space
    from which the mem objects are allocated.
  3. format-branch

    pjaaskel committed Oct 12, 2019
  4. Fix pocl.barrier calls were removed too early

    linehill authored and pjaaskel committed Aug 12, 2019
    Workgroup pass replaced pocl.barrier declaration with an empty
    definition which then caused barrier calls to be removed and
    unwanted/illegal code duplication to happen in the following standard
    LLVM optimizations.
  5. Fix a warning on test_ldexp.cl

    linehill authored and pjaaskel committed Jun 14, 2019
    Fix a warning on test_ldexp.cl when cl_khr_fp64 is not available.
  6. Do not enable vectorization for SPMD devices

    pjaaskel committed Jul 13, 2019
    If they desire intra-WI vectorization, they can
    launch it in their target passes. This can have
    dramatic impact to WG IR compilation time.
  7. Optional calls to dump LLVM IR pass execution timing info

    linehill authored and pjaaskel committed May 28, 2019
    Helps finding the compilation time bottlenecks.
  8. Also reset num_buffers to zero

    pjaaskel committed Sep 26, 2019
    Avoids segfault if the freeing is invoked multiple times for
    a reason or another.
  9. Do not run scalarizer

    pjaaskel committed Sep 21, 2019
    It is unclear if this is anymore beneficial with the vectorizers in
    the latest LLVM versions. The logic should be integrated to the
    loop vectorizer which should selectively scalarize vector datatypes
    and leave them intact in case it cannot produce better vectorization
    across the loop iterations.
  10. Avoid (re)optimization of printf

    pjaaskel committed Jul 12, 2019
    As printf is optimized during builtin library generation, it just
    slows down each kernel's compilation which calls printf. Actually,
    we generally are not interested in printf's performance since it's
    typically used on debugging mode or on non-performance critical
    parts.
  11. Remove invalid setPreservesAll()s.

    pjaaskel committed Jul 12, 2019
    They seem illegal since we modify the functions.
  12. Do not call instcombine explicitly anymore

    pjaaskel committed Jul 12, 2019
    The extra calls seem to not be needed anymore with current LLVM versions
    for good quality results, they just slow down the WG function IR generation.
  13. Don't internalize globals starting with "__wrap_"

    linehill authored and pjaaskel committed May 28, 2019
    A use case is call replacement via GNU linker switch --wrap.  The
    functions starting with "__wrap_" may not be referenced until final
    link and LLVM optimizations may delete them if they are internalized.
  14. Fix triggered an assetion when replacing __cl_printf

    linehill authored and pjaaskel committed May 24, 2019
    Fix LLVM assertion was triggered when replacing calls to __cl_printf
    to __pocl_printf due to return value type mismatch.  LLVM changed
    return value of __cl_printf to void when no one was using the value
    and thus lead to the issue.
  15. printf: fix arguments, have meaningful return value

    linehill authored and pjaaskel committed May 24, 2019
    - __cl_printf: Put valid arguments into __pocl_printf_format_full()
      call so LLVM's interprocedural optimizations do not wreack havoc,
      e.g. turning call into a trap call because the format string
      argument was NULL (as placeholder).
    - Actually return possible error value instead of returning always
      zero in __cl_printf and __pocl_printf functions.
  16. matrix1: fixes to the test case

    pjaaskel committed Jan 9, 2019
    Fix the case when max local size is larger than global. Also fix
    a div by zero due to an illegal assertion. The div by zero got
    triggered if the local wg is larger than matrix size. It just
    gets silenced by the FPE handler which is installed in case any
    of the CPU devices is built in.
Commits on Sep 25, 2019
  1. Merge branch 'release_1_4'

    Michal Babej
    Michal Babej committed Sep 25, 2019
  2. Update documentation

    Michal Babej
    Michal Babej committed Sep 25, 2019
Commits on Sep 24, 2019
  1. Merge branch 'release_1_4'

    Michal Babej
    Michal Babej committed Sep 24, 2019
  2. Fixes to global memory size detection

    Michal Babej
    Michal Babej committed Sep 24, 2019
    * fix getrlimit() use without CMake detection
    * fix rlimit_data applied only to max_mem_alloc_limit,
      instead of global_mem_size
    * fix computation in size_t, use cl_ulong instead, even
      on 32bit systems
  3. Remove unused/broken configurations from Travis CI

    Michal Babej
    Michal Babej committed Sep 24, 2019
    LLVM 4 is not supported anymore, and the Clang build on Mac OS X
    seems broken because of unknown compiler flag.
  4. perform compile test to select -march or -mcpu for clang

    anbe42 authored and Michal Babej committed Sep 17, 2019
  5. add custom_try_compile_clang_silent macro

    anbe42 authored and Michal Babej committed Sep 17, 2019
  6. add printf tests for parameter passing

    anbe42 authored and Michal Babej committed Sep 17, 2019
  7. enable --exclude-libs on all UNIX except Mac OS X

    anbe42 authored and Michal Babej committed Sep 17, 2019
Older
You can’t perform that action at this time.