Skip to content
Permalink
Browse files

Merge updates for OpenMP 4.5 support

Summary:
  - Remove state machine in the outliner.
  - Fix deadlock in reductions with print/write statements.
  - Fix issue with privatization.
  - Add support for team private scalar variables.
  - Remove outlining of teams construct.
  - Remove outlining of parallel construct.
  - Remove unused xflag definitions from Flang.

Adapted RFCNT information for multiple outlining.
    The number of references of the label sptr is
    changed at the first expand. At the second
    expand of the same ILM, expander treats the
    label sptr according to its changed RFCNT
    value. It results wrong ILI generating. This
    patch creates a new symbol feature, that's
    RFCNTDEV, and saves the original RFCNT
    value into RFCNTDEV. At the second expand,
    it replaces original value from RFCNTDEV
    into RFCNT.

Enhancements to the uplevel struct passed to the
outlined functions.

Add OpenMP directive based multiple outliner.
    Previously implemented as a state machine at
    the outliner, was based on the temporary ILM
    files. It turns out it is an insufficient
    mechanism. Because there are multiple
    outlined functions in the same temp ILM file,
    compiler needs to multiple outlining for
    ones that are in OpenMP target region. For
    example:

    This patch removes state-machine in outliner,
    and introduce recompilation in process_input
    function in the main file.

Fix deadlock with write statements in OpenMP reduce
    When a function that contains an OpenMP
    reduction is called within a write Fortran
    statement and the code is executed by
    multiple threads, it causes a deadlock. The
    problem happens because the compiler
    generates a lock `_mp_bcs/ecs_nest` for the
    write statement to make them thread safe;
    the same lock is also used to implement the
    OpenMP reduction. When the reduction is
    called within a print statement the same
    nested lock is being acquired twice by
    multiple threads causing the deadlock. The
    solution is to use different locks for write
    statements and reductions. This introduces
    a new set of locks (`_mp_bcs/ecs_nest_red`)
    for the reduction in the runtime.

Privatization while generating device code.
    Privatization was implemented in several
    places on the compiler; iliutil is one of
    them that generates ILI to load private
    data. Function is_llvm_local_private finds
    out whether the sptr is local private
    or not, it returns true if the current
    function and enclosed function of the sptr
    are the same. However, it fails here while
    generating code for device, because we
    generate two functions sptr for host and
    device in OpenMP. The problem here is that
    the enclosed function of the sptr is the host
    outlined function while GBL_CURRFUNC is the
    device outlined function. In order to make a
    correct comparison, we need to find out host
    function sptr of GBL_CURRFUNC and compare
    it with enclosed function of the sptr.

Handle private sptr when there is no outlining
    Currently the compiler doesn't process
    private sptr. For example, it doesn't
    init SNAME field when there is no
    outlining. However, privatization is possible
    with "!$omp do" or "!$omp distribute" which
    don't require outlining.
    This fix process private sptr even if there is
    no outlining.

Support for team private scalar variables.
    Team private variables should be private to
    the team, but shared accross threads withing
    a team. In our OpenMP model, we associated
    CTAs with the teams. Therefore, shared memory
    is the perfect place to create and keep
    scalars values if they are scalar. Non-scalar
    team private variables is not support yet.

Outlining elimination for teams construct.
    This change implements outlining elimination
    for the teams construct. The aim of doing
    it, outlining prevents compiler and runtime
    further optimization. we should avoid as
    much as we can. Introduced a mechanism that
    finds the enclosed function of the symbol
    in ilitutil.c.

Other features:
    - New optimization - Outlining elimination
      for parallel construct. It can be enabled
      with -Mx,233,0x10
    - Fixed for outlining elimination for teams
      construct. It was passing wrong function
      pointer to _kmpc_fork_call.
    - Fix setting gbl.ompaccel_intarget, that
      is set if the region is inside of target
      construct, while compiling code for the
      target device.
    - Moved PARENCLFUNC feature setting into
      host expanding. It helps to find encolosed
      function info for symbols that are created
      in the outlined functions.

    - New API "__tgt_target_teams_parallel" is
      added in nvomp RTL. This change implements
      its associated codegen in compiler
    - Improved TARGETMODE ILM, that is located
      just before BTARGET. Now TARGETMODE
      has num_teams, thread_limit, num_threads
      beside combined construct mode. We use
      all this information while generating call
      "__tgt_target_teams_parallel".
  • Loading branch information...
gklimowicz committed Jul 25, 2019
1 parent 0880aa2 commit 96d9ea819ea6da871f542043537411db2c8b1c65
Showing with 1,433 additions and 601 deletions.
  1. +2 −1 include/flang/Error/errmsg.n
  2. +31 −1 runtime/flangrti/llcrit.c
  3. +2 −0 tools/flang1/flang1exe/dump.c
  4. +1 −0 tools/flang1/flang1exe/flgdf.h
  5. +1 −1 tools/flang1/flang1exe/global.h
  6. +21 −1 tools/flang1/flang1exe/lowerilm.c
  7. +41 −56 tools/flang1/flang1exe/semsmp.c
  8. +1 −1 tools/flang1/flang1exe/semutil.c
  9. +3 −3 tools/flang1/flang1exe/semutil2.c
  10. +21 −20 tools/flang2/docs/xflag.n
  11. +2 −2 tools/flang2/flang2exe/aarch64-Linux/flgdf.h
  12. +29 −21 tools/flang2/flang2exe/cgmain.cpp
  13. +11 −10 tools/flang2/flang2exe/expand.cpp
  14. +87 −63 tools/flang2/flang2exe/expsmp.cpp
  15. +29 −10 tools/flang2/flang2exe/iliutil.cpp
  16. +3 −3 tools/flang2/flang2exe/kmpcutil.cpp
  17. +4 −3 tools/flang2/flang2exe/kmpcutil.h
  18. +7 −7 tools/flang2/flang2exe/llutil.cpp
  19. +7 −0 tools/flang2/flang2exe/llutil.h
  20. +35 −17 tools/flang2/flang2exe/main.cpp
  21. +8 −0 tools/flang2/flang2exe/mwd.cpp
  22. +374 −132 tools/flang2/flang2exe/ompaccel.cpp
  23. +119 −31 tools/flang2/flang2exe/ompaccel.h
  24. +248 −51 tools/flang2/flang2exe/outliner.cpp
  25. +50 −5 tools/flang2/flang2exe/outliner.h
  26. +2 −2 tools/flang2/flang2exe/ppc64le-Linux/flgdf.h
  27. +233 −143 tools/flang2/flang2exe/tgtutil.cpp
  28. +8 −1 tools/flang2/flang2exe/tgtutil.h
  29. +3 −1 tools/flang2/flang2exe/upper.cpp
  30. +2 −2 tools/flang2/flang2exe/x86_64-Linux/flgdf.h
  31. +10 −2 tools/flang2/utils/ilmtp/aarch64/ilmtp.n
  32. +10 −2 tools/flang2/utils/ilmtp/ppc64le/ilmtp.n
  33. +10 −1 tools/flang2/utils/ilmtp/x86_64/ilmtp.n
  34. +4 −0 tools/flang2/utils/symtab/symtab.n
  35. +3 −1 tools/flang2/utils/upper/upperilm.in
  36. +2 −2 tools/flang2/utils/upper/upperl.c
  37. +4 −1 tools/shared/llmputil.h
  38. +5 −4 tools/shared/utils/global.h
@@ -1,5 +1,5 @@
.\"/*
.\" * Copyright (c) 1994-2018, NVIDIA CORPORATION. All rights reserved.
.\" * Copyright (c) 1994-2019, NVIDIA CORPORATION. All rights reserved.
.\" *
.\" * Licensed under the Apache License, Version 2.0 (the "License");
.\" * you may not use this file except in compliance with the License.
@@ -717,3 +717,4 @@ Please update your licenses to use this feature.
.MS S 1200 "OpenMP GPU - [$] is used, it is not implemented yet."
.MS S 1201 "OpenMP GPU - [$] is used with [$], this usage is not implemented yet."
.MS S 1202 "OpenMP GPU - [$] is used independently than [$], this usage is not implemented yet."
.MS S 1204 "OpenMP GPU - Internal compiler error. Reason: [$] at [$]"
@@ -1,5 +1,5 @@
/*
* Copyright (c) 2017, NVIDIA CORPORATION. All rights reserved.
* Copyright (c) 2015-2019, NVIDIA CORPORATION. All rights reserved.
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
@@ -94,7 +94,37 @@ _mp_ecs_nest(void)
omp_unset_nest_lock(&nest_lock);
}

// This lock is used only for the reduction, using the same locks
// `_mp_bcs/ecs` was causing deadlocks when a function that contains a
// reductions was being called directly in a print/write statement
// (those locks are used to make the print thread safe and when used
// in conjunction with a reduction the same lock was being called
// twice by different threads causing the deadlock)
static kmp_critical_name nest_sem_red;
static omp_nest_lock_t nest_lock_red;

static int is_init_nest_red = 0;

void
_mp_bcs_nest_red(void)
{
if (!is_init_nest_red) {
_mp_p(&nest_sem_red);
if (!is_init_nest_red) {
omp_init_nest_lock(&nest_lock_red);
is_init_nest_red = 1;
}
_mp_v(&nest_sem_red);
}
omp_set_nest_lock(&nest_lock_red);
}

void
_mp_ecs_nest_red(void)
{
omp_unset_nest_lock(&nest_lock_red);
}
// end reduction locks

/* allocate and initialize a thread-private common block */

@@ -1184,6 +1184,8 @@ dast(int astx)
A_ENDLABP(0, 0);
putnzint("procbind", A_PROCBINDG(0));
A_PROCBINDP(0, 0);
putnzint("num_threads", A_NPARG(0));
A_NPARP(0, 0);
break;
case A_MP_TEAMS:
putnzint("lop", A_LOPG(0));
@@ -76,6 +76,7 @@ FLG flg = {
FALSE, /* -nosequence */
25, /* errorlimit */
FALSE, /* don't allow smp directives */
FALSE, /* omptarget - don't allow OpenMP Offload directives */
0, /* tpcount */
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, /* tpvalue */
};
@@ -214,8 +214,8 @@ typedef struct {
LOGICAL defaulthpf;
LOGICAL defaultsequence;
int errorlimit;
LOGICAL omptarget; /* TRUE => allow omp accel directives */
LOGICAL smp; /* TRUE => allow smp directives */
LOGICAL omptarget; /* TRUE => allow OpenMP Offload directives */
int tpcount;
int tpvalue[TPNVERSION]; /* target processor(s), for unified binary */
int accmp;
@@ -5215,10 +5215,30 @@ lower_stmt(int std, int ast, int lineno, int label)
}

if(flg.omptarget) {
int ilm_numteams=0, ilm_numthreads=0, ilm_threadlimit=0;
if(A_NTEAMSG(ast)) {
lower_expression(A_NTEAMSG(ast));
ilm_numteams = lower_conv(A_NTEAMSG(ast), DT_INT);
} else {
ilm_numteams = plower("oS", "ICON", lowersym.intzero);
}
if(A_THRLIMITG(ast)) {
lower_expression(A_THRLIMITG(ast));
ilm_threadlimit = lower_conv(A_THRLIMITG(ast), DT_INT);
} else {
ilm_threadlimit = plower("oS", "ICON", lowersym.intzero);
}
if(A_NPARG(ast)) {
lower_expression(A_NPARG(ast));
ilm_numthreads = lower_conv(A_NPARG(ast), DT_INT);
} else {
ilm_numthreads = plower("oS", "ICON", lowersym.intzero);
}
if(A_LOOPTRIPCOUNTG(ast) != 0) {
lower_omp_target_tripcount(A_LOOPTRIPCOUNTG(ast), std);
}
plower("on", "MP_TARGETMODE", A_COMBINEDTYPEG(ast));
plower("oniii", "MP_TARGETMODE", A_COMBINEDTYPEG(ast), ilm_numteams,
ilm_threadlimit, ilm_numthreads);
}

//pragmatype specifies combined type of target.
@@ -116,12 +116,17 @@ static int mk_atomic_update_intr(int, int);
static void do_map();
static LOGICAL use_atomic_for_reduction(int);

#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
static char *map_type;
bool isalways = false;
static int get_omp_combined_mode(BIGINT64 type);
static void mp_handle_map_clause(SST *, int, char *, int, int, bool);
static void mp_check_maptype(const char *maptype);
static LOGICAL is_in_omptarget(int d);
#endif
#ifdef OMP_OFFLOAD_LLVM
static void gen_reduction_ompaccel(REDUC *reducp, REDUC_SYM *reduc_symp,
LOGICAL rmme, LOGICAL in_parallel);
static OMP_TARGET_MODE get_omp_combined_mode(BIGINT64 type);
#endif

/*-------- define data structures and macros local to this file: --------*/
@@ -545,10 +550,7 @@ static int distchunk;
static int mp_iftype;
static ISZ_T kernel_do_nest;
static LOGICAL has_team = FALSE;
#ifdef OMP_OFFLOAD_LLVM
static char *map_type;
bool isalways = false;
#endif


static LOGICAL any_pflsr_private = FALSE;

@@ -1632,12 +1634,6 @@ semsmp(int rednum, SST *top)
* <mp stmt> ::= <targetupdate begin> <opt par list> |
*/
case MP_STMT47: {
#ifdef OMP_OFFLOAD_LLVM
if(flg.omptarget) {
error(1200, ERR_Severe, gbl.lineno, "target update",
NULL);
}
#endif
check_targetdata(OMP_TARGETUPDATE, "OMP TARGET UPDATE");
ast = mk_stmt(A_MP_TARGETUPDATE, 0);
if (CL_PRESENT(CL_IF)) {
@@ -1666,8 +1662,9 @@ semsmp(int rednum, SST *top)
clause_errchk(BT_TARGET, "OMP TARGET");
mp_create_bscope(0);
DI_BTARGET(sem.doif_depth) = emit_btarget(A_MP_TARGET);
#ifdef OMP_OFFLOAD_LLVM
A_COMBINEDTYPEP(DI_BTARGET(sem.doif_depth),
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if(flg.omptarget)
A_COMBINEDTYPEP(DI_BTARGET(sem.doif_depth),
get_omp_combined_mode(BT_TARGET));
#endif
par_push_scope(TRUE);
@@ -2459,9 +2456,6 @@ semsmp(int rednum, SST *top)
* <par attr> ::= <map clause> |
*/
case PAR_ATTR26:
#ifndef OMP_OFFLOAD_LLVM
error(547, ERR_Warning, gbl.lineno, "MAP", CNULL);
#endif
break;
/*
* <par attr> ::= <depend clause> |
@@ -2485,9 +2479,6 @@ semsmp(int rednum, SST *top)
* <par attr> ::= <motion clause> |
*/
case PAR_ATTR30:
#ifndef OMP_OFFLOAD_LLVM
error(547, ERR_Warning, gbl.lineno, "MOTION", CNULL);
#endif
break;
/*
* <par attr> ::= DIST_SCHEDULE ( <id name> <opt distchunk> ) |
@@ -3115,19 +3106,14 @@ semsmp(int rednum, SST *top)
* <map clause> ::= MAP ( <map item> )
*/
case MAP_CLAUSE1:
#ifdef OMP_OFFLOAD_LLVM
if (flg.omptarget)
break;
#endif
error(547, ERR_Warning, gbl.lineno, "MAP", CNULL);
break;

/* ------------------------------------------------------------------ */
/*
* <map item> ::= <accel data list> |
*/
case MAP_ITEM1:
#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if (flg.omptarget) {
mp_handle_map_clause(top, CL_MAP, "tofrom", 1, DI_ID(sem.doif_depth),
isalways);
@@ -3138,7 +3124,7 @@ semsmp(int rednum, SST *top)
* <map item> ::= <map type> : <accel data list>
*/
case MAP_ITEM2:
#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if (flg.omptarget) {
if (strlen(map_type) == 0)
error(1205, ERR_Severe, gbl.lineno, scn.id.name + SST_CVALG(RHS(1)), 0);
@@ -3155,7 +3141,7 @@ semsmp(int rednum, SST *top)
* <map type> ::= <id name> |
*/
case MAP_TYPE1:
#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if (flg.omptarget) {
mp_check_maptype(scn.id.name + SST_CVALG(RHS(1)));
map_type = scn.id.name + SST_CVALG(RHS(1));
@@ -3166,15 +3152,14 @@ semsmp(int rednum, SST *top)
* <map type> ::= ALWAYS <opt comma> <id name>
*/
case MAP_TYPE2:
#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if (flg.omptarget) {
mp_check_maptype(scn.id.name + SST_CVALG(RHS(1)));
map_type = scn.id.name + SST_CVALG(RHS(1));
isalways = true;
break;
}
#endif
error(547, ERR_Warning, gbl.lineno, "ALWAYS", CNULL);
break;

/* ------------------------------------------------------------------ */
@@ -3203,17 +3188,11 @@ semsmp(int rednum, SST *top)
* <motion clause> ::= TO ( <var ref list> ) |
*/
case MOTION_CLAUSE1:
#ifndef OMP_OFFLOAD_LLVM
error(547, ERR_Warning, gbl.lineno, "TO", CNULL);
#endif
break;
/*
* <motion clause> ::= FROM ( <var ref list> )
*/
case MOTION_CLAUSE2:
#ifndef OMP_OFFLOAD_LLVM
error(547, ERR_Warning, gbl.lineno, "FROM", CNULL);
#endif
break;

/* ------------------------------------------------------------------ */
@@ -4842,15 +4821,18 @@ semsmp(int rednum, SST *top)
* <accel data> ::= <accel data name> ( <accel sub list> ) |
*/
case ACCEL_DATA1:
#ifdef OMP_OFFLOAD_LLVM
if (SST_IDG(RHS(1)) == S_IDENT || SST_IDG(RHS(1)) == S_DERIVED) {
sptr = SST_SYMG(RHS(1));
} else {
sptr = SST_LSYMG(RHS(1));
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
if(is_in_omptarget(sem.doif_depth)) {
//todo support array section in the map clause for openmp
if (SST_IDG(RHS(1)) == S_IDENT || SST_IDG(RHS(1)) == S_DERIVED) {
sptr = SST_SYMG(RHS(1));
} else {
sptr = SST_LSYMG(RHS(1));
}
error(1206, ERR_Warning, gbl.lineno, sptr ? SYMNAME(sptr) : CNULL, CNULL);
goto accel_data2;
break;
}
error(1206, ERR_Warning, gbl.lineno, sptr ? SYMNAME(sptr) : CNULL, CNULL);
goto accel_data2;
break;
#endif
accel_data1:
if (SST_IDG(RHS(1)) == S_IDENT || SST_IDG(RHS(1)) == S_DERIVED) {
@@ -7623,7 +7605,6 @@ do_copyprivate()
static void
do_map()
{
#ifdef OMP_OFFLOAD_LLVM
if (!flg.omptarget)
return;

@@ -7640,7 +7621,6 @@ do_map()
}
ast = mk_stmt(A_MP_EMAP, 0);
(void)add_stmt(ast);
#endif
}

static int
@@ -8618,7 +8598,7 @@ begin_combine_constructs(BIGINT64 construct)
LOGICAL do_enter = FALSE;

has_team = FALSE;
#ifdef OMP_OFFLOAD_LLVM
#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
combinedMode = get_omp_combined_mode(construct);
if (flg.omptarget) {
if (!CL_PRESENT(CL_SCHEDULE)) {
@@ -8714,8 +8694,6 @@ begin_combine_constructs(BIGINT64 construct)
DI_BPAR(doif) = emit_bpar();
par_push_scope(FALSE);
begin_parallel_clause(sem.doif_depth);

return;
}
if (BT_PAR & construct) {
if (do_enter) {
@@ -10152,7 +10130,9 @@ gen_reduction_ompaccel(REDUC *reducp, REDUC_SYM *reduc_symp, LOGICAL rmme,
current_red = current_red->next;
}
}
#endif /* OMP_OFFLOAD_LLVM */

#if defined(OMP_OFFLOAD_LLVM) || defined(OMP_OFFLOAD_PGI)
static void
mp_check_maptype(const char *maptype)
{
@@ -10212,7 +10192,7 @@ mp_handle_map_clause(SST *top, int clause, char *maptype, int op, int construct,
CL_LAST(clause) = itemend;
}

static OMP_TARGET_MODE
static int
get_omp_combined_mode(BIGINT64 type)
{
BIGINT64 combined_type;
@@ -10240,8 +10220,8 @@ get_omp_combined_mode(BIGINT64 type)
if ((type & BT_TARGET))
return mode_target;
return mode_none_target;
return -1;
}

#endif
/* Return FALSE if the sptr is presented in multiple
* data sharing clauses: (e.g., shared(x) private(x)),
@@ -10312,18 +10292,23 @@ check_map_data_sharing(int sptr)
return TRUE;
}

/**
* \brief Decide to use optimized atomic usage.
*/
LOGICAL use_opt_atomic(int d)
static LOGICAL is_in_omptarget(int d)
{
#ifdef OMP_OFFLOAD_LLVM
if(flg.omptarget && (DI_IN_NEST(d, DI_TARGET) ||
DI_IN_NEST(d, DI_TARGTEAMSDISTPARDO) ||
DI_IN_NEST(d, DI_TARGPARDO) ||
DI_IN_NEST(d, DI_TARGETSIMD) ||
DI_IN_NEST(d, DI_TARGTEAMSDIST)))
return TRUE;
return FALSE;
}
/**
* \brief Decide to use optimized atomic usage.
*/
LOGICAL use_opt_atomic(int d)
{
#ifdef OMP_OFFLOAD_LLVM
return is_in_omptarget(d);
#endif
return OPT_OMP_ATOMIC;
}
@@ -1540,7 +1540,7 @@ mklvalue(SST *stkptr, int stmt_type)
with the same name of induction variables.
*/
if (stmt_type == 0 && flg.smp && (SCG(sptr) != SC_PRIVATE) &&
(sem.expect_cuf_do || (sem.collapsed_acc_do && !sem.seq_acc_do))) {
sem.expect_cuf_do ) {
int newsptr;
newsptr = insert_sym(sptr);
DCLDP(newsptr, TRUE);

0 comments on commit 96d9ea8

Please sign in to comment.
You can’t perform that action at this time.