Skip to content

Commit

Permalink
[OpenMP] Offloading descriptor registration and device codegen.
Browse files Browse the repository at this point in the history
Summary:
In order to offloading work properly two things need to be in place:
- a descriptor with all the offloading information (device entry functions, and global variable) has to be created by the host and registered in the OpenMP offloading runtime library.
- all the device functions need to be emitted for the device and a convention has to be in place so that the runtime library can easily map the host ID of an entry point with the actual function in the device.

This patch adds support for these two things. However, only entry functions are being registered given that 'declare target' directive is not yet implemented.

About offloading descriptor:

The details of the descriptor are explained with more detail in http://goo.gl/L1rnKJ. Basically the descriptor will have fields that specify the number of devices, the pointers to where the device images begin and end (that will be defined by the linker), and also pointers to a the begin and end of table whose entries contain information about a specific entry point. Each entry has the type:
```
struct __tgt_offload_entry{
 void *addr;
 char *name;
 int64_t size;
};
```  
and will be implemented in a pre determined (ELF) section `.omp_offloading.entries` with 1-byte alignment, so that when all the objects are linked, the table is in that section with no padding in between entries (will be like a C array). The code generation ensures that all `__tgt_offload_entry` entries are emitted in the same order for both host and device so that the runtime can have the corresponding entries in both host and device in same index of the table, and efficiently implement the mapping.

The resulting descriptor is registered/unregistered with the runtime library using the calls `__tgt_register_lib` and `__tgt_unregister_lib`. The registration is implemented in a high priority global initializer so that the registration happens always before any initializer (that can potentially include target regions) is run.

The driver flag -omptargets= was created to specify a comma separated list of devices the user wants to support so that the new functionality can be exercised. Each device is specified with its triple.


About target codegen:

The target codegen is pretty much straightforward as it reuses completely the logic of the host version for the same target region. The tricky part is to identify the meaningful target regions in the device side. Unlike other programming models, like CUDA, there are no already outlined functions with attributes that mark what should be emitted or not. So, the information on what to emit is passed in the form of metadata in host bc file. This requires a new option to pass the host bc to the device frontend. Then everything is similar to what happens in CUDA: the global declarations emission is intercepted to check to see if it is an "interesting" declaration. The difference is that instead of checking an attribute, the metadata information in checked. Right now, there is only a form of metadata to pass information about the device entry points (target regions). A class `OffloadEntriesInfoManagerTy` was created to manage all the information and queries related with the metadata. The metadata looks like this:
```
!omp_offload.info = !{!0, !1, !2, !3, !4, !5, !6}

!0 = !{i32 0, i32 52, i32 77426347, !"_ZN2S12r1Ei", i32 479, i32 13, i32 4}
!1 = !{i32 0, i32 52, i32 77426347, !"_ZL7fstatici", i32 461, i32 11, i32 5}
!2 = !{i32 0, i32 52, i32 77426347, !"_Z9ftemplateIiET_i", i32 444, i32 11, i32 6}
!3 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 99, i32 11, i32 0}
!4 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 272, i32 11, i32 3}
!5 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 127, i32 11, i32 1}
!6 = !{i32 0, i32 52, i32 77426347, !"_Z3fooi", i32 159, i32 11, i32 2}
```
The fields in each metadata entry are (in sequence):
Entry 1) an ID of the type of metadata - right now only zero is used meaning "OpenMP target region".
Entry 2) a unique ID of the device where the input source file that contain the target region lives. 
Entry 3) a unique ID of the file where the input source file that contain the target region lives. 
Entry 4) a mangled name of the function that encloses the target region.
Entries 5) and 6) line and column number where the target region was found.
Entry 7) is the order the entry was emitted.

Entry 2) and 3) are required to distinguish files that have the same function name.
Entry 4) is required to distinguish different instances of the same declaration (usually templated ones)
Entries 5) and 6) are required to distinguish the particular target region in body of the function (it is possible that a given target region is not an entry point - if clause can evaluate always to zero - and therefore we need to identify the "interesting" target regions. )

This patch replaces http://reviews.llvm.org/D12306.

Reviewers: ABataev, hfinkel, tra, rjmccall, sfantao

Subscribers: FBrygidyn, piotr.rak, Hahnfeld, cfe-commits

Differential Revision: http://reviews.llvm.org/D12614

llvm-svn: 256842
  • Loading branch information
Samuel Antao committed Jan 5, 2016
1 parent 411af72 commit 4d5f0bb
Show file tree
Hide file tree
Showing 18 changed files with 1,713 additions and 162 deletions.
3 changes: 3 additions & 0 deletions clang/include/clang/Basic/DiagnosticDriverKinds.td
Original file line number Diff line number Diff line change
Expand Up @@ -123,6 +123,9 @@ def err_drv_emit_llvm_link : Error<
def err_drv_optimization_remark_pattern : Error<
"%0 in '%1'">;
def err_drv_no_neon_modifier : Error<"[no]neon is not accepted as modifier, please use [no]simd instead">;
def err_drv_invalid_omp_target : Error<"OpenMP target is invalid: '%0'">;
def err_drv_omp_host_ir_file_not_found : Error<
"The provided host compiler IR file '%0' is required to generate code for OpenMP target regions but cannot be found.">;

def warn_O4_is_O3 : Warning<"-O4 is equivalent to -O3">, InGroup<Deprecated>;
def warn_drv_lto_libpath : Warning<"libLTO.dylib relative to clang installed dir not found; using 'ld' default search path instead">,
Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Basic/LangOptions.def
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,8 @@ LANGOPT(HalfArgsAndReturns, 1, 0, "half args and returns")
LANGOPT(CUDA , 1, 0, "CUDA")
LANGOPT(OpenMP , 1, 0, "OpenMP support")
LANGOPT(OpenMPUseTLS , 1, 0, "Use TLS for threadprivates or runtime calls")
LANGOPT(OpenMPIsDevice , 1, 0, "Generate code only for OpenMP target device")

LANGOPT(CUDAIsDevice , 1, 0, "Compiling for CUDA device")
LANGOPT(CUDAAllowHostCallsFromHostDevice, 1, 0, "Allow host device functions to call host functions")
LANGOPT(CUDADisableTargetCallChecks, 1, 0, "Disable checks for call targets (host, device, etc.)")
Expand Down
10 changes: 9 additions & 1 deletion clang/include/clang/Basic/LangOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,15 @@ class LangOptions : public LangOptionsBase {

/// \brief Options for parsing comments.
CommentOptions CommentOpts;


/// \brief Triples of the OpenMP targets that the host code codegen should
/// take into account in order to generate accurate offloading descriptors.
std::vector<llvm::Triple> OMPTargetTriples;

/// \brief Name of the IR file that contains the result of the OpenMP target
/// host code generation.
std::string OMPHostIRFile;

LangOptions();

// Define accessors/mutators for language options of enumeration type.
Expand Down
9 changes: 9 additions & 0 deletions clang/include/clang/Driver/CC1Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -677,6 +677,15 @@ def fcuda_include_gpubinary : Separate<["-"], "fcuda-include-gpubinary">,
def fcuda_target_overloads : Flag<["-"], "fcuda-target-overloads">,
HelpText<"Enable function overloads based on CUDA target attributes.">;

//===----------------------------------------------------------------------===//
// OpenMP Options
//===----------------------------------------------------------------------===//

def fopenmp_is_device : Flag<["-"], "fopenmp-is-device">,
HelpText<"Generate code only for an OpenMP target device.">;
def omp_host_ir_file_path : Separate<["-"], "omp-host-ir-file-path">,
HelpText<"Path to the IR file produced by the frontend for the host.">;

} // let Flags = [CC1Option]


Expand Down
2 changes: 2 additions & 0 deletions clang/include/clang/Driver/Options.td
Original file line number Diff line number Diff line change
Expand Up @@ -1649,6 +1649,8 @@ def nostdlib : Flag<["-"], "nostdlib">;
def object : Flag<["-"], "object">;
def o : JoinedOrSeparate<["-"], "o">, Flags<[DriverOption, RenderAsInput, CC1Option, CC1AsOption]>,
HelpText<"Write output to <file>">, MetaVarName<"<file>">;
def omptargets_EQ : CommaJoined<["-"], "omptargets=">, Flags<[DriverOption, CC1Option]>,
HelpText<"Specify comma-separated list of triples OpenMP offloading targets to be supported">;
def pagezero__size : JoinedOrSeparate<["-"], "pagezero_size">;
def pass_exit_codes : Flag<["-", "--"], "pass-exit-codes">, Flags<[Unsupported]>;
def pedantic_errors : Flag<["-", "--"], "pedantic-errors">, Group<pedantic_Group>, Flags<[CC1Option]>;
Expand Down
Loading

0 comments on commit 4d5f0bb

Please sign in to comment.