Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offloading with GCC fails as not all .gnu.offload_funcs sections are merged #1196

Closed
tob2 opened this issue Feb 19, 2024 · 4 comments
Closed

Comments

@tob2
Copy link

tob2 commented Feb 19, 2024

Followup to #1190 / #1188.

Summary: Device side now okay (thanks!) but required host-side data missing.

The entries (well, one entry) in the '.gnu.offload_funcs sections' of 'file-1.c' (→ #1190 for the testcase) is ignored, leading to too few entries in a table and a run time fail.


Namely:

Background: For offloading, the run-time library needs to map a host function to a device function. It does so by creating an array of host-function pointers on one side and of device-functions on the other side – and then, when the the n-th host function is used, the run-time library calls the n-th device function.

The device-side code generation is handled via the LTO plugin as described in #1190 (now working) or at https://gcc.gnu.org/wiki/Offloading#Compilation_process – the latter also describes the following:

As the host side might be processed without LTO, the symbols are collected by writing them into special sections. GCC 14 has:

  • .gnu.offload_funcs (host → device function mapping)
  • .gnu.offload_funcs (host → device global variable mapping)
  • .gnu.offload_ind_funcs (new since GCC 14: for device → host function mapping for some functions; used to permit passing host function pointers to the device and calling the device function there)

Those are constructed as follows:

(A) The compiler links crtoffloadbegin.o and crtoffloadend.o, which is created from https://github.com/gcc-mirror/gcc/blob/master/libgcc/offloadstuff.c – the files contain (here only showing one section / array):

The first file contains – note the explicit setting of the section name:

#define OFFLOAD_FUNC_TABLE_SECTION_NAME ".gnu.offload_funcs"

const void *const __offload_func_table[0]
  __attribute__ ((__used__, visibility ("hidden"),
		  section (OFFLOAD_FUNC_TABLE_SECTION_NAME))) = { }

the last file contains:

const void *const __offload_funcs_end[0]
  __attribute__ ((__used__, visibility ("hidden"),
		  section (OFFLOAD_FUNC_TABLE_SECTION_NAME))) = { };

And there is additionally crtoffloadtable.o, which shows how it is used.

extern const void *const __offload_func_table[];
extern const void *const __offload_funcs_end[];

const void *const __OFFLOAD_TABLE__[]
  __attribute__ ((__visibility__ ("hidden"))) =
{
  &__offload_func_table, &__offload_funcs_end,

(B) The file-1.c – see #1190 for the used example – contains the following:

        .type   .offload_func_table, @object
        .size   .offload_func_table, 8
.offload_func_table:
        .quad   main._omp_fn.0

And the OFFLOAD_TABLE is passed (for each offload target) as argument to the run-time library together with information about the device side. — The number of entries is given by ((uintptr_t)&__offload_funcs_end – (uintptr_t))/sizeof(void*).


Current run-time result with MOLD:

libgomp: Cannot map target functions or variables (expected 0, have 2)  # Read as: (expected 0, have 1)

(The '2' instead of '1' is due to a separate loaded entry, i.e. '2' is correct but misleading.)

Thus, on the host side the '.gnu.offload_funcs' array is empty – missing the 'main._omp_fn.0' entry from 'file-1.s' – i.e. there is a problem with section merging.

@rui314
Copy link
Owner

rui314 commented Feb 20, 2024

The assembled result of file1.c doesn't seem to contain .offload_func_table in my opensuse/tumbleweed docker image test environment. What am I missing?

Here is the full output of assembly: https://gist.github.com/rui314/43d92d355b419fb9d2310df05302500b

@tob2
Copy link
Author

tob2 commented Feb 20, 2024

Looks as if I mixed up the file names – try file-2.c.

(It is the file that contains '#pragma target' as that's where the host code needs to invoke the device code, i.e. he code inside the block following the pragma is run on the device while the arguments to the pragma are processed on the host. — The function name is the one of the containing function (here "main" followed by _omp_fn and a by-file sequence number.)

I pasted my .s file as comment to your gist: https://gist.github.com/rui314/43d92d355b419fb9d2310df05302500b#gistcomment-4918366

@rui314 rui314 closed this as completed in 8090737 Feb 21, 2024
@rui314
Copy link
Owner

rui314 commented Feb 21, 2024

Can you try again with the above commit?

@tob2
Copy link
Author

tob2 commented Feb 21, 2024

Yes, I can confirm that it now works.
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants