Permalink
Switch branches/tags
Commits on Oct 1, 2018
  1. Merge pull request #597 from ThePortlandGroup/nv_stage

    sscalpone committed Oct 1, 2018
    Pull 2018-09-30T14-55 Recent NVIDIA Changes
Commits on Sep 30, 2018
  1. Add another guard for the previous check-in for resolveImp() function

    gklimowicz committed Sep 30, 2018
    Do not overload the symbol if its scope is the same as the current
    scope. That is, in the case where symbol is an interface, its scope
    is its enclosing module (when the interface is defined in a
    module). If the symbol is a procedure, then its scope is itself.
    Therefore, also look at SCOPEG(scope) in this case.
  2. Clean up all LOGICAL, TRUE, and FALSE usage in flang2

    gklimowicz committed Sep 30, 2018
    At the same time this will turn new usage of those into syntax errors.
  3. Fix parse of "<prefix spec> <data type> <prefix spec>"

    gklimowicz committed Sep 30, 2018
    Like:
    
     PURE INTEGER MODULE FUNCTION f1(i)
        ...
     END FUNCTION
Commits on Sep 27, 2018
Commits on Sep 25, 2018
  1. Improve single/double precision sincos, double precision power

    gklimowicz committed Sep 25, 2018
    Add AVX2 (FMA) versions of integrated sincos to libpgmath.
    
    Remove all files no longer used from repository.
Commits on Sep 21, 2018
  1. Merge pull request #594 from ThePortlandGroup/nv_stage

    sscalpone committed Sep 21, 2018
    Pull 2018-09-21T05-56 Recent NVIDIA Changes
  2. Fix syntax error incorrectly detected with defined unary operator

    gklimowicz committed Sep 21, 2018
    In is_intrinsic_opr(), make sure rop is not NULL before processing
    a binary expression that uses a defined binary operator. The rop
    variable can be NULL if processing an expression that uses a
    defined unary operator.
  3. Workaround clang compiler complaining about narrowing of unsigned

    gklimowicz committed Sep 21, 2018
    Workaround clang compiler complaining about narrowing of unsigned
    to int.  (See 6.3.1.3 para 3 of the C standard.)
    
    Addresses issue #588
  4. Add small matrix multiplication path to the 'matmul' intrinsic

    gklimowicz committed Sep 21, 2018
    Removed the unnecessary array transformation arguments to the complex calls:
      mmcmplx16.c
      mmcmplx8.c
    
    Added small matrix computation path:
    
      m*ax*b_*.f90
    
    Added parameter 'min_blocked_mult' that sets the 'sizes' of the matrices that
    are needed to follow the blocked multiplication path. This parameter is
    derived empirically from running experiments on X86 and OpenPower
    architectures.
  5. Fix issue with forward reference to type-bound procedure

    gklimowicz committed Sep 21, 2018
    In resolveImp(), add another condition for generating a forward
    reference from a type bound procedure declaration. Recall that a type
    bound procedure declaration is of the form "procedure :: bindingName =>
    implementation"  (or "procedure :: implementation" when bindingName and
    implementation are the same name).  If there is another symbol of the same
    name as implementation that is overloaded but not inherited, then make
    sure that a forward reference is declared for the implementation. The
    implementation is considered overloaded in this case if it's not
    inherited from a parent type, is a procedure, has the private flag, and
    it's declared in the current scope. Note that this function checks for
    some other special overloading cases as well (e.g., when overloading an
    intrinsic, a generic procedure, etc.)
Commits on Sep 20, 2018
  1. Improvements to libpgmath: log

    gklimowicz committed Sep 20, 2018
    Add superword level parallelism vectorization to FMA version
    of double precision LOG (X86-64 only).
Commits on Sep 17, 2018
  1. Merge pull request #593 from ThePortlandGroup/nv_stage

    sscalpone committed Sep 17, 2018
    Pull 2018-09-17T07-16 Recent NVIDIA Changes
  2. Improve vectorization with conditionals around sin/cos

    gklimowicz committed Sep 17, 2018
    Recognize a loop with half-size predicate conditional and body
    computation using sin or cosine intrinsics.
    
    This change provides enhanced capabilities for such loops.
    Previously we tried to prevent vectorization due to the nature of
    sin/cos handling, and now these loops correctly vectorize.
Commits on Sep 14, 2018
  1. Remove redundancies from flang runtime and libpgmath

    gklimowicz committed Sep 14, 2018
    Remove cdexp and round from flang and flangrti; they are in libpgmath.
    
    Remove mthdecls.h from flang/runtime; it is unused; libpgmath has its own.
  2. Fix a recent regression in atomics support

    gklimowicz committed Sep 14, 2018
    A recent update changed the types of the bit-fields in ATOMIC_INFO from
    unsigned int to enum types.  This resulted in the values of those
    fields being incorrectly sign extended when extracted from the struct.
    Fix the problem by changing the fields of ATOMIC_INFO to not be
    bit-fields.  This required changing the code that encodes/decodes
    ATOMIC_INFO to/from an int.
  3. Update copyright notice

    gklimowicz committed Sep 14, 2018
    Remove runtime locks around malloc()/free() in mpmalloc.c for
    glibc-based systems.
    
    This is due to following statement in the glibc manual:
    
       To  avoid  corruption in multithreaded applications,
       mutexes are used internally to protect the memory-management
       data structures employed by these functions. In a multithreaded
       application in which threads simultaneously allocate and free
       memory, there could be contention for these mutexes. To scalably
       handle memory allocation in multithreaded applications,
       glibc creates additional memory  allocation arenas if mutex
       contention is detected.  Each arena is a large region of memory
       that is internally allocated by the system (using brk(2)
       or mmap(2)), and managed with its own mutexes.
    
    Having locks around them in the flang runtime library
    can ruin optimization effort when tcmalloc is preloaded
    to replace standard malloc()/free() implementation with
    the one optimized for reducing lock contention.
    
    Implements pull request #460
  4. Fix VPERMUTE for input and returned vectors having different sizes

    gklimowicz committed Sep 14, 2018
    Example: VPERMUTE <2 x float>, <2 x float>, (0, 1, 2, 3), <4
    x float>  <- return dtype This example shuffles the contents
    from both <2 x float> vectors into a single <4 x float> vector.
    Previously, in this example, the <2 x float>s were bitcasted
    into <4 x float>s, which triggers an internal error. This
    change prevents that bitcast from happening.
Commits on Sep 13, 2018
  1. Merge pull request #590 from ThePortlandGroup/nv_stage

    sscalpone committed Sep 13, 2018
    Pull 2018-09-13T12-17 Recent NVIDIA Changes
  2. Remove dependency on mthdecls.h where not really needed

    gklimowicz committed Sep 13, 2018
    Convert LONGLONG_T to int64_t.
  3. Merge pull request #587 from ThePortlandGroup/nv_stage

    sscalpone committed Sep 13, 2018
    Pull 2018-09-12T18-00 Recent NVIDIA Changes
  4. Fix when the array being passed is an member of a derived type

    gklimowicz committed Sep 13, 2018
    When constructing the runtime call to create the temporary
    descriptor, need to call check_member().