Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return 0 if there is no CFP on the EC yet #7116

Merged
merged 1 commit into from
Jan 13, 2023

Conversation

tenderlove
Copy link
Member

StackProf uses a signal handler to call rb_profile_frames. Signals are delivered to threads randomly, and can be delivered after the thread has been created but before the CFP has been established on the EC.

This commit returns early if there is no CFP to use.

Here is some info from the core files we are seeing. Here you can see the CFP on the current EC is 0x0:

(gdb) p ruby_current_ec
$20 = (struct rb_execution_context_struct *) 0x7f3481301b50
(gdb) p ruby_current_ec->cfp
$21 = (rb_control_frame_t *) 0x0

Here is where VM_FRAME_CFRAME_P gets a 0x0 CFP:

6  VM_FRAME_CFRAME_P (cfp=0x0) at vm_core.h:1350
7  VM_FRAME_RUBYFRAME_P (cfp=<optimized out>) at vm_core.h:1350
8  rb_profile_frames (start=0, limit=2048, buff=0x7f3493809590, lines=0x7f349380d590) at vm_backtrace.c:1587

Down the stack we can see this is happening after thread creation:

19 0x00007f3495bf9420 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
20 0x000055d531574e55 in thread_start_func_2 (th=<optimized out>, stack_start=<optimized out>) at thread.c:676
21 0x000055d531575b31 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:1170
22 0x00007f3495bed609 in start_thread (arg=<optimized out>) at pthread_create.c:477
23 0x00007f3495b12133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

StackProf uses a signal handler to call `rb_profile_frames`.  Signals
are delivered to threads randomly, and can be delivered after the thread
has been created but before the CFP has been established on the EC.

This commit returns early if there is no CFP to use.

Here is some info from the core files we are seeing.  Here you can see
the CFP on the current EC is 0x0:

```
(gdb) p ruby_current_ec
$20 = (struct rb_execution_context_struct *) 0x7f3481301b50
(gdb) p ruby_current_ec->cfp
$21 = (rb_control_frame_t *) 0x0
```

Here is where VM_FRAME_CFRAME_P gets a 0x0 CFP:

```
6  VM_FRAME_CFRAME_P (cfp=0x0) at vm_core.h:1350
7  VM_FRAME_RUBYFRAME_P (cfp=<optimized out>) at vm_core.h:1350
8  rb_profile_frames (start=0, limit=2048, buff=0x7f3493809590, lines=0x7f349380d590) at vm_backtrace.c:1587
```

Down the stack we can see this is happening after thread creation:

```
19 0x00007f3495bf9420 in <signal handler called> () at /lib/x86_64-linux-gnu/libpthread.so.0
20 0x000055d531574e55 in thread_start_func_2 (th=<optimized out>, stack_start=<optimized out>) at thread.c:676
21 0x000055d531575b31 in thread_start_func_1 (th_ptr=<optimized out>) at thread_pthread.c:1170
22 0x00007f3495bed609 in start_thread (arg=<optimized out>) at pthread_create.c:477
23 0x00007f3495b12133 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
```
@k0kubun
Copy link
Member

k0kubun commented Jan 13, 2023

For this to be safe, I thought ec->cfp must become non-zero only after its end_cfp (ec->vm_stack + ec->vm_stack_size) becomes non-zero. If we always use rb_ec_initialize_vm_stack to initialize those fields, because ec->cfp is set using end_cfp, ec->cfp is set afterwards and the assignment order is unlikely (maybe impossible) to be reversed. So I guess (or hope) that case will be safe with this patch.

@tenderlove tenderlove merged commit eab7f46 into ruby:master Jan 13, 2023
@tenderlove tenderlove deleted the thread-resilience branch January 13, 2023 01:11
ivoanjo added a commit to DataDog/dd-trace-rb that referenced this pull request Jan 25, 2023
…ile_frames`

**What does this PR do?**:

This PR imports the fix in ruby/ruby#7116 into
the custom version of `rb_profile_frames` that we ship inside ddtrace.

**Motivation**:

As I mention in the inline comments, I don't expect this extra safety
check to ever be triggered, but just in case I decided to add it
anyway, since the cost of getting my expectation wrong is a VM crash.

**Additional Notes**:

N/A

**How to test the change?**:

Adding test coverage for this is... difficult. Our current test of
this logic is indirect (e.g. via our own code for sampling), and
reproducing the state to trigger this is hard.

So for now I only added the check without extra testing.
casperisfine added a commit to casperisfine/vernier that referenced this pull request Nov 22, 2023
3.2.0 has a bug that cause `rb_profile_frames` to not be async signal safe: ruby/ruby#7116

It has been backported in 3.2.1, so might as well prevent the installation of the gem.

Fix: jhawthorn#42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

2 participants