Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++4OpenCL] Fix generation of __cxa_atexit #47391

Open
AnastasiaStulova opened this issue Nov 2, 2020 · 2 comments
Open

[C++4OpenCL] Fix generation of __cxa_atexit #47391

AnastasiaStulova opened this issue Nov 2, 2020 · 2 comments
Labels
bugzilla opencl

Comments

@AnastasiaStulova
Copy link
Contributor

@AnastasiaStulova AnastasiaStulova commented Nov 2, 2020

Bugzilla Link 48047
Version trunk
OS All
CC @AnastasiaStulova

Extended Description

Currently global dtor callback registration stub has the prototype that ignores pointer address spaces

i32 @​__cxa_atexit(void (i8*), i8, i8 addrspace(1)*)

The address space of object parameter of destructor and argument of __cxa_atexit is not taken into account and therefore when address space appears in the destructor or an object the generated IR is not functional.

Some ideas to fix this is proposed in https://reviews.llvm.org/D62413#1583079.

This suggests that for OpenCL we could generate a custom function (one per each global object to be destoried). This function will be passed into func ptr parameter of __cxa_atexit instead of dtor itself how it is done now. The new function will contain code to invoke dtor on the right object directly i.e. for the following example

struct S {
~S(){};
};

S s;

The proposed IR generated equivalent should be:

void __dtor_s(){ // The naming scheme is to be determined
~S(s); // NOTE if address spaces are mismatched we need a conversion. Sema should already check its validity (?)
}

void @​__cxx_global_var_init()
{
@​__cxa_atexit(__dtor_S, null, null);
}

Note that currently clang doesn't generate __dtor_s() and it passes dtor itself as an argument to @​__cxa_atexit, see https://godbolt.org/z/4soT5r

The advantage of the proposed approach is that it can work for any address space uniformly and it makes implementation of @​__cxa_atexit for various addres spaces very straight forward. There is no need to change default ABI at all. However it might be good to standardise what clang generates in a long term. This works for global variables and static local destruction too both in gloabl and constant address spaces.

The drawback is that we need to generate an extra function for each global object. To avoid this we could introduce some sort of overloading for @​__cxa_atexit that could take dtor function with different parameters as well as object argument in different address spaces. It seems like a large change to ABI and potentially not backward compatible. There could also be some hybrid schemes where only @​__cxa_atexit with address spaces is to be mangled. However both options makes @​__cxa_atexit implementation non-trivial.

@AnastasiaStulova
Copy link
Contributor Author

@AnastasiaStulova AnastasiaStulova commented Nov 2, 2020

For completenes I provide pseudo code explaining possible implementation of CXX ABI that should be executed on the device side when the program is terminated.

struct __cxx_atexit_entry{
void* dtor;
};

// TODO: where is this created and how is it set on GPU side? (can be set via ctor parameters? advantage is that is can be implemented by the applications)
// This list should be global to the entire program in case it is composed of multiple translation units
extern __cxx_atexit_entry *__dtor_list;

// TODO: can be taken from modules metadata that counts number of cxa_atexit calls or can be set by an API
extern size_t *__dtor_max_count;

void __cxx_atexit(void *f, void *arg, void *dsohandler)
{
size_t dtor_count = 0;
__dtor_list[dtor_count].dtor = f;
dtor_count++;
//TODO: assert(dtor_count < *__dtor_max_count);
}

extern void __cxx_invoke_dtor(void *f, void *arg); //TODO: CodeGen'ed, Clang builtin or lib function compiled with func ptr extension.

void __cxx_finalize()
{
for (size_t i = 0 ; i < *__dtor_max_count; i++)
{
__cxx_invoke_dtor(__dtor_list[i].dtor);
}
}

// This kernel can be enqueued when the progam context is destroyed.
__kernel __exit(){
__cxx_finalize();
}

@AnastasiaStulova
Copy link
Contributor Author

@AnastasiaStulova AnastasiaStulova commented Nov 27, 2020

The design from standard C++ requires an extension for function pointers to be available or at least some limited form of it. As this is not portable across all implementations, there seems to be no way to provide global dtors all targets unless we significantly change C++ flow. This seems too expensive to justify at present. Therefore it is suggested to add a feature macro indicating whether global dtor's support is present or not - for example __opencl_cxx_global_dtors. The macro should not be defined by default but only for targets that support global dtors.

@llvmbot llvmbot transferred this issue from llvm/llvm-bugzilla-archive Dec 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bugzilla opencl
Projects
None yet
Development

No branches or pull requests

1 participant