Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HPXC initialization causes HPX to hang #17

Closed
Pansysk75 opened this issue Aug 8, 2023 · 6 comments
Closed

HPXC initialization causes HPX to hang #17

Pansysk75 opened this issue Aug 8, 2023 · 6 comments

Comments

@Pansysk75
Copy link
Member

Pansysk75 commented Aug 8, 2023

This an HPXMP issue, but the issue was caused by a change in HPXC, so I'm posting it here for now.

I came across issue this when using HPXMP, for example in this sample code: https://github.com/STEllAR-GROUP/hpxmp_meta/blob/main/running_examples/hello.cpp

@rtohid
The issue is most likely introduced by commit 0e37553
Here's what i found:

The following function is called on initialization of the kmp(hpxmp) runtime

thread_handle* get_thread_data(hpx::threads::thread_id_type id)
{
thread_handle* thandle;
std::size_t th_data = hpx::threads::get_thread_data(id);
if (th_data)
{
thandle = reinterpret_cast<thread_handle*>(th_data);
}
else
{
thandle = new thread_handle(id);
hpx::threads::set_thread_data(id, reinterpret_cast<std::size_t>(thandle));
}
// thread_handle* thandle = new thread_handle(id);
HPX_ASSERT(thandle);
HPX_ASSERT(thandle->magic == MAGIC);
return thandle;
}

Here's the stack trace that occurs:

hpx_main
 __kmp_api_GOMP_parallel ()
   __kmp_get_global_thread_id_reg ()
    __kmp_do_serial_initialize (...)
     __kmp_register_root (...)
      __kmp_gtid_set_specific (...)
       hpxc_setspecific(...)
        get_thread_data(...)

This means that get_thread_data() is called from the hpx_main hpx thread.

The line
thandle = new thread_handle(id);
creates a thread_handle for hpx_main.

This is bad, because hpx_main doesn't have the wrapper_function() that would allow it to delete the thread_handle object on completion.
So, that thread_handle stays alive for ever, keeping a hpx::threads::thread_id_ref_type, and thus never letting the hpx runtime shut down.

I would propose a solution for this, but I do not know what this fix was trying to get around, so I might end up repeating a previous problem.

Let me know if I can help any further.

@Pansysk75 Pansysk75 changed the title HPCX initialization causes HPX to hang HPXC initialization causes HPX to hang Aug 8, 2023
@hkaiser
Copy link
Member

hkaiser commented Aug 9, 2023

It may be better to store the thread id for hpx_main as a non-reference counting id (i.e., threads_id_type) in this case.

@rtohid
Copy link
Contributor

rtohid commented Aug 9, 2023

Thank you, @Pansysk75. If I remember correctly, the issue was in hpxc_setspecific, where the thandle would be null for hpx_main, and what you're referring to was supposed to be the workaround.

@Pansysk75
Copy link
Member Author

Pansysk75 commented Aug 9, 2023

@hkaiser I was looking at a fix using hpx::threads::add_thread_exit_callback() to add a callback function to hpx_main. The callback doesn't seem to get called, even though hpx_main terminates :((
Should that work in theory, or am I not supposed to use it like that?

thread_handle(hpx::threads::thread_id_ref_type id)
:
. . .
{
        auto on_completion = [thandle = this]() {
            std::cout << "This print is not printing" << std::endl;
            // Signal completion
            thandle->promise.set_value(nullptr);
            int r = --thandle->refc;
            if (r == 0)
            {
                delete thandle;
            }
        };

        if (!hpx::threads::add_thread_exit_callback(id.noref(), on_completion))  <--------
        {
            // Failed to add callback because the hpx thread has terminated
            // This throw does not happen
            throw;
        }
}

The rationale of doing this, is to retain the ability to have hpxc-thread-data even for threads not launched through the hpxc API, and have that data be deleted when the thread finished (just like we do for the other threads spawned with hpxc_thread_create)

@hkaiser
Copy link
Member

hkaiser commented Aug 9, 2023

@Pansysk75 now I'm confused. I thought the whole problem was that hpx_main was not terminating, in which case it would be logical for the callback to never be called...

@Pansysk75
Copy link
Member Author

@Pansysk75 now I'm confused. I thought the whole problem was that hpx_main was not terminating, in which case it would be logical for the callback to never be called...

Ah, it's terminating, holding the reference just keeps it from getting deleted after termination

@hkaiser
Copy link
Member

hkaiser commented Aug 9, 2023

@Pansysk75 now I'm confused. I thought the whole problem was that hpx_main was not terminating, in which case it would be logical for the callback to never be called...

Ah, it's terminating, holding the reference just keeps it from getting deleted after termination

Well, in this case the callback should be invoked (see: https://github.com/STEllAR-GROUP/hpx/blob/e7c31a49dd4e550a61134e498f3ecfb585d201cb/libs/core/threading_base/include/hpx/threading_base/register_thread.hpp#L72)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants